package match
import "github.com/google/licensecheck/internal/match"
Package match defines matching algorithms and support code for the license checker.
Index ¶
- Variables
- type Dict
- func (d *Dict) Insert(w string) WordID
- func (d *Dict) InsertSplit(text string) []Word
- func (d *Dict) Lookup(w string) WordID
- func (d *Dict) Split(text string) []Word
- func (d *Dict) Words() []string
- type LRE
- func ParseLRE(d *Dict, file, s string) (*LRE, error)
- func (re *LRE) Dict() *Dict
- func (re *LRE) File() string
- type Match
- type Matches
- type MultiLRE
- func NewMultiLRE(list []*LRE) (_ *MultiLRE, err error)
- func (re *MultiLRE) Dict() *Dict
- func (re *MultiLRE) Match(text string) *Matches
- type SyntaxError
- type Word
- type WordID
Variables ¶
var TraceDFA int
TraceDFA controls whether DFA execution prints debug tracing when stuck. If TraceDFA > 0 and the DFA has followed a path of at least TraceDFA symbols since the last matching state but hits a dead end, it prints out information about the dead end.
Types ¶
type Dict ¶
type Dict struct {
// contains filtered or unexported fields
}
A Dict maps words to integer indexes in a word list, of type WordID. The zero Dict is an empty dictionary ready for use.
Lookup and Words are read-only operations, safe for any number of concurrent calls from multiple goroutines. Insert is a write operation; it must not run concurrently with any other call, whether to Insert, Lookup, or Words.
func (*Dict) Insert ¶
Insert adds the word w to the word list, returning its index. If w is already in the word list, it is not added again; Insert returns the existing index.
func (*Dict) InsertSplit ¶
InsertSplit splits text into a sequence of lowercase words, inserting any new words in the dictionary.
func (*Dict) Lookup ¶
Lookup looks for the word w in the word list and returns its index. If w is not in the word list, Lookup returns BadWord.
func (*Dict) Split ¶
Split splits text into a sequence of lowercase words. It does not add any new words to the dictionary. Unrecognized words are reported as having ID = BadWord.
func (*Dict) Words ¶
Words returns the current word list. The list is not a copy; the caller can read but must not modify the list.
type LRE ¶
type LRE struct {
// contains filtered or unexported fields
}
An LRE is a compiled license regular expression.
TODO: Move this comment somewhere non-internal later.
A license regular expression (LRE) is a pattern syntax intended for describing large English texts such as software licenses, with minor allowed variations. The pattern syntax and the matching are word-based and case-insensitive; punctuation is ignored in the pattern and in the matched text.
The valid LRE patterns are:
word - a single case-insensitive word __N__ - any sequence of up to N words expr1 expr2 - concatenation expr1 || expr2 - alternation (( expr )) - grouping expr?? - zero or one instances of expr //** text **// - a comment
To make patterns harder to misread in large texts:
- || must only appear inside (( ))
- ?? must only follow (( ))
- (( must be at the start of a line, preceded only by spaces
- )) must be at the end of a line, followed only by spaces and ??.
For example:
//** https://en.wikipedia.org/wiki/Filler_text **// Now is ((not))?? the time for all good ((men || women || people)) to come to the aid of their __1__.
func ParseLRE ¶
ParseLRE parses the string s as a license regexp. The file name is used in error messages if non-empty.
func (*LRE) Dict ¶
Dict returns the Dict used by the LRE.
func (*LRE) File ¶
File returns the file name passed to ParseLRE.
type Match ¶
type Match struct { ID int // index of LRE in list passed to NewMultiLRE Start int // word index of start of match End int // word index of end of match }
A Match records the position of a single match in a text.
type Matches ¶
type Matches struct { Text string // the entire text Words []Word // the text, split into Words List []Match // the matches }
A Matches is a collection of all leftmost-longest, non-overlapping matches in text.
type MultiLRE ¶
type MultiLRE struct {
// contains filtered or unexported fields
}
A MultiLRE matches multiple LREs simultaneously against a text. It is more efficient than matching each LRE in sequence against the text.
func NewMultiLRE ¶
NewMultiLRE returns a MultiLRE looking for the given LREs. All the LREs must have been parsed using the same Dict; if not, NewMultiLRE panics.
func (*MultiLRE) Dict ¶
Dict returns the Dict used by the MultiLRE.
func (*MultiLRE) Match ¶
Match reports all leftmost-longest, non-overlapping matches in text. It always returns a non-nil *Matches, in order to return the split text. Check len(matches.List) to see whether any matches were found.
type SyntaxError ¶
A SyntaxError reports a syntax error during parsing.
func (*SyntaxError) Error ¶
func (e *SyntaxError) Error() string
type Word ¶
A Word represents a single word found in a text.
type WordID ¶
type WordID int32
A WordID is the index of a word in a dictionary.
const AnyWord WordID = -2
AnyWord represents a wildcard matching any word.
const BadWord WordID = -1
BadWord represents a word not present in the dictionary.
Source Files ¶
dict.go regexp.go rematch.go resyntax.go
- Version
- v0.3.1 (latest)
- Published
- Sep 3, 2020
- Platform
- linux/amd64
- Imports
- 10 packages
- Last checked
- 5 months ago –
Tools for package owners.