package match

import "github.com/google/licensecheck/internal/match"

Package match defines matching algorithms and support code for the license checker.

Index

Variables

var TraceDFA int

TraceDFA controls whether DFA execution prints debug tracing when stuck. If TraceDFA > 0 and the DFA has followed a path of at least TraceDFA symbols since the last matching state but hits a dead end, it prints out information about the dead end.

Types

type Dict

type Dict struct {
	// contains filtered or unexported fields
}

A Dict maps words to integer indexes in a word list, of type WordID. The zero Dict is an empty dictionary ready for use.

Lookup and Words are read-only operations, safe for any number of concurrent calls from multiple goroutines. Insert is a write operation; it must not run concurrently with any other call, whether to Insert, Lookup, or Words.

func (*Dict) Insert

func (d *Dict) Insert(w string) WordID

Insert adds the word w to the word list, returning its index. If w is already in the word list, it is not added again; Insert returns the existing index.

func (*Dict) InsertSplit

func (d *Dict) InsertSplit(text string) []Word

InsertSplit splits text into a sequence of lowercase words, inserting any new words in the dictionary.

func (*Dict) Lookup

func (d *Dict) Lookup(w string) WordID

Lookup looks for the word w in the word list and returns its index. If w is not in the word list, Lookup returns BadWord.

func (*Dict) Split

func (d *Dict) Split(text string) []Word

Split splits text into a sequence of lowercase words. It does not add any new words to the dictionary. Unrecognized words are reported as having ID = BadWord.

func (*Dict) Words

func (d *Dict) Words() []string

Words returns the current word list. The list is not a copy; the caller can read but must not modify the list.

type LRE

type LRE struct {
	// contains filtered or unexported fields
}

An LRE is a compiled license regular expression.

TODO: Move this comment somewhere non-internal later.

A license regular expression (LRE) is a pattern syntax intended for describing large English texts such as software licenses, with minor allowed variations. The pattern syntax and the matching are word-based and case-insensitive; punctuation is ignored in the pattern and in the matched text.

The valid LRE patterns are:

word            - a single case-insensitive word
__N__           - any sequence of up to N words
expr1 expr2     - concatenation
expr1 || expr2  - alternation
(( expr ))      - grouping
expr??          - zero or one instances of expr
//** text **//  - a comment

To make patterns harder to misread in large texts:

For example:

//** https://en.wikipedia.org/wiki/Filler_text **//
Now is
((not))??
the time for all good
((men || women || people))
to come to the aid of their __1__.

func ParseLRE

func ParseLRE(d *Dict, file, s string) (*LRE, error)

ParseLRE parses the string s as a license regexp. The file name is used in error messages if non-empty.

func (*LRE) Dict

func (re *LRE) Dict() *Dict

Dict returns the Dict used by the LRE.

func (*LRE) File

func (re *LRE) File() string

File returns the file name passed to ParseLRE.

type Match

type Match struct {
	ID    int // index of LRE in list passed to NewMultiLRE
	Start int // word index of start of match
	End   int // word index of end of match
}

A Match records the position of a single match in a text.

type Matches

type Matches struct {
	Text  string  // the entire text
	Words []Word  // the text, split into Words
	List  []Match // the matches
}

A Matches is a collection of all leftmost-longest, non-overlapping matches in text.

type MultiLRE

type MultiLRE struct {
	// contains filtered or unexported fields
}

A MultiLRE matches multiple LREs simultaneously against a text. It is more efficient than matching each LRE in sequence against the text.

func NewMultiLRE

func NewMultiLRE(list []*LRE) (_ *MultiLRE, err error)

NewMultiLRE returns a MultiLRE looking for the given LREs. All the LREs must have been parsed using the same Dict; if not, NewMultiLRE panics.

func (*MultiLRE) Dict

func (re *MultiLRE) Dict() *Dict

Dict returns the Dict used by the MultiLRE.

func (*MultiLRE) Match

func (re *MultiLRE) Match(text string) *Matches

Match reports all leftmost-longest, non-overlapping matches in text. It always returns a non-nil *Matches, in order to return the split text. Check len(matches.List) to see whether any matches were found.

type SyntaxError

type SyntaxError struct {
	File    string
	Offset  int
	Context string
	Err     string
}

A SyntaxError reports a syntax error during parsing.

func (*SyntaxError) Error

func (e *SyntaxError) Error() string

type Word

type Word struct {
	ID WordID
	Lo int32 // Word appears at text[Lo:Hi].
	Hi int32
}

A Word represents a single word found in a text.

type WordID

type WordID int32

A WordID is the index of a word in a dictionary.

const AnyWord WordID = -2

AnyWord represents a wildcard matching any word.

const BadWord WordID = -1

BadWord represents a word not present in the dictionary.

Source Files

dict.go regexp.go rematch.go resyntax.go

Version
v0.3.1 (latest)
Published
Sep 3, 2020
Platform
linux/amd64
Imports
10 packages
Last checked
5 months ago

Tools for package owners.