package lexer

import "github.com/alecthomas/participle/lexer"

Package lexer defines interfaces and implementations used by Participle to perform lexing.

The primary interfaces are Definition and Lexer. There are three implementations of these interfaces:

TextScannerLexer is based on text/scanner. This is the fastest, but least flexible, in that tokens are restricted to those supported by that package. It can scan about 5M tokens/second on a late 2013 15" MacBook Pro.

The second lexer is constructed via the Regexp() function, mapping regexp capture groups to tokens. The complete input source is read into memory, so it is unsuitable for large inputs.

The final lexer provided accepts a lexical grammar in EBNF. Each capitalised production is a lexical token supported by the resulting Lexer. This is very flexible, but a bit slower, scanning around 730K tokens/second on the same machine, though it is currently completely unoptimised. This could/should be converted to a table-based lexer.

Lexer implementations must use Panic/Panicf to report errors.

Index

Constants

const (
	// EOF represents an end of file.
	EOF rune = -(iota + 1)
)

Functions

func FormatError

func FormatError(pos Position, message string) string

FormatError formats an error in the form "[<filename>:][<line>:<pos>:] <message>"

func MakeSymbolTable

func MakeSymbolTable(def Definition, types ...string) (map[rune]bool, error)

MakeSymbolTable builds a lookup table for checking token ID existence.

For each symbolic name in "types", the returned map will contain the corresponding token ID as a key.

func NameOfReader

func NameOfReader(r interface{}) string

NameOfReader attempts to retrieve the filename of a reader.

func SymbolsByRune

func SymbolsByRune(def Definition) map[rune]string

SymbolsByRune returns a map of lexer symbol names keyed by rune.

Types

type Definition

type Definition interface {
	// Lex an io.Reader.
	Lex(io.Reader) (Lexer, error)
	// Symbols returns a map of symbolic names to the corresponding pseudo-runes for those symbols.
	// This is the same approach as used by text/scanner. For example, "EOF" might have the rune
	// value of -1, "Ident" might be -2, and so on.
	Symbols() map[string]rune
}

Definition provides the parser with metadata for a lexer.

var (
	TextScannerLexer Definition = &defaultDefinition{}

	// DefaultDefinition defines properties for the default lexer.
	DefaultDefinition = TextScannerLexer
)

TextScannerLexer is a lexer that uses the text/scanner module.

func Must

func Must(def Definition, err error) Definition

Must takes the result of a Definition constructor call and returns the definition, but panics if it errors

eg.

lex = lexer.Must(lexer.Build(`Symbol = "symbol" .`))

func Regexp

func Regexp(pattern string) (Definition, error)

Regexp creates a lexer definition from a regular expression.

Each named sub-expression in the regular expression matches a token. Anonymous sub-expressions will be matched and discarded.

eg.

def, err := Regexp(`(?P<Ident>[a-z]+)|(\s+)|(?P<Number>\d+)`)

type Error

type Error struct {
	Msg string
	Tok Token
}

Error represents an error while parsing.

func ErrorWithTokenf

func ErrorWithTokenf(tok Token, format string, args ...interface{}) *Error

ErrorWithTokenf creats a new Error with the given token as context.

func Errorf

func Errorf(pos Position, format string, args ...interface{}) *Error

Errorf creats a new Error at the given position.

func (*Error) Error

func (e *Error) Error() string

Error complies with the error interface and reports the position of an error.

func (*Error) Message

func (e *Error) Message() string

func (*Error) Token

func (e *Error) Token() Token

type Lexer

type Lexer interface {
	// Next consumes and returns the next token.
	Next() (Token, error)
}

A Lexer returns tokens from a source.

func Lex

func Lex(r io.Reader) Lexer

Lex an io.Reader with text/scanner.Scanner.

This provides very fast lexing of source code compatible with Go tokens.

Note that this differs from text/scanner.Scanner in that string tokens will be unquoted.

func LexBytes

func LexBytes(b []byte) Lexer

LexBytes returns a new default lexer over bytes.

func LexString

func LexString(s string) Lexer

LexString returns a new default lexer over a string.

func LexWithScanner

func LexWithScanner(r io.Reader, scan *scanner.Scanner) Lexer

LexWithScanner creates a Lexer from a user-provided scanner.Scanner.

Useful if you need to customise the Scanner.

type PeekingLexer

type PeekingLexer struct {
	// contains filtered or unexported fields
}

PeekingLexer supports arbitrary lookahead as well as cloning.

func Upgrade

func Upgrade(lex Lexer) (*PeekingLexer, error)

Upgrade a Lexer to a PeekingLexer with arbitrary lookahead.

func (*PeekingLexer) Clone

func (p *PeekingLexer) Clone() *PeekingLexer

Clone creates a clone of this PeekingLexer at its current token.

The parent and clone are completely independent.

func (*PeekingLexer) Cursor

func (p *PeekingLexer) Cursor() int

Cursor position in tokens.

func (*PeekingLexer) Length

func (p *PeekingLexer) Length() int

Length returns the number of tokens consumed by the lexer.

func (*PeekingLexer) Next

func (p *PeekingLexer) Next() (Token, error)

Next consumes and returns the next token.

func (*PeekingLexer) Peek

func (p *PeekingLexer) Peek(n int) (Token, error)

Peek ahead at the n+1 token. ie. Peek(0) will peek at the next token.

type Position

type Position struct {
	Filename string
	Offset   int
	Line     int
	Column   int
}

Position of a token.

func (Position) GoString

func (p Position) GoString() string

func (Position) String

func (p Position) String() string

type Token

type Token struct {
	// Type of token. This is the value keyed by symbol as returned by Definition.Symbols().
	Type  rune
	Value string
	Pos   Position
}

A Token returned by a Lexer.

func ConsumeAll

func ConsumeAll(lexer Lexer) ([]Token, error)

ConsumeAll reads all tokens from a Lexer.

func EOFToken

func EOFToken(pos Position) Token

EOFToken creates a new EOF token at the given position.

func RuneToken

func RuneToken(r rune) Token

RuneToken represents a rune as a Token.

func (Token) EOF

func (t Token) EOF() bool

EOF returns true if this Token is an EOF token.

func (Token) GoString

func (t Token) GoString() string

func (Token) String

func (t Token) String() string

Source Files

doc.go errors.go lexer.go peek.go regexp.go text_scanner.go

Directories

PathSynopsis
lexer/ebnfPackage ebnf is an EBNF lexer for Participle.
lexer/ebnf/internalPackage internal is a library for EBNF grammars.
lexer/regexPackage regex provides a regex based lexer using a readable list of named patterns.
lexer/statefulPackage stateful defines a nested stateful lexer.
Version
v0.7.1 (latest)
Published
Nov 26, 2020
Platform
js/wasm
Imports
9 packages
Last checked
2 days ago

Tools for package owners.