package tokenizer
import "github.com/benoitkugler/pstokenizer"
Implements the lowest level of processing of PS/PDF files. The tokenizer is also usable with Type1 font files. See the higher level package pdf/parser to read PDF objects.
Index ¶
- func IsAsciiWhitespace(ch byte) bool
- func IsHexChar(c byte) (uint8, bool)
- type Fl
- type Kind
- type Token
- func Tokenize(data []byte) ([]Token, error)
- func (t Token) Float() (Fl, error)
- func (t Token) Int() (int, error)
- func (t Token) IsNumber() bool
- func (t Token) IsOther(value string) bool
- type Tokenizer
- func NewTokenizer(data []byte) *Tokenizer
- func NewTokenizerFromReader(src io.Reader) *Tokenizer
- func (pr Tokenizer) Bytes() []byte
- func (pr Tokenizer) CurrentPosition() int
- func (pr Tokenizer) HasEOLBeforeToken() bool
- func (pr Tokenizer) IsEOF() bool
- func (pr *Tokenizer) NextToken() (Token, error)
- func (pr Tokenizer) PeekPeekToken() (Token, error)
- func (pr Tokenizer) PeekToken() (Token, error)
- func (tk *Tokenizer) Reset(data []byte)
- func (tk *Tokenizer) ResetFromReader(src io.Reader)
- func (tk *Tokenizer) SetPosition(pos int)
- func (pr *Tokenizer) SkipBytes(n int) []byte
- func (pr *Tokenizer) StreamPosition() int
Functions ¶
func IsAsciiWhitespace ¶
IsAsciiWhitespace returns true if `ch` is one of the ASCII whitespaces.
func IsHexChar ¶
IsHexChar converts a hex character into its value and a success flag (see encoding/hex for details).
Types ¶
type Fl ¶
type Fl = float64
type Kind ¶
type Kind uint8
Kind is the kind of token.
const ( EOF Kind Float Integer String StringHex Name StartArray EndArray StartDic EndDic Other // include commands in content stream StartProc // only valid in PostScript files EndProc // idem CharString // PS only: binary stream, introduce by and integer and a RD or -| command )
func (Kind) String ¶
type Token ¶
type Token struct { // Additional value found in the data // Note that it is a copy of the source bytes. Value []byte Kind Kind }
Token represents a basic piece of information. `Value` must be interpreted according to `Kind`, which is left to parsing packages.
func Tokenize ¶
Tokenize consume all the input, splitting it into tokens. When performance matters, you should use the iteration method `NextToken` of the Tokenizer type.
func (Token) Float ¶
Float returns the float value of the token.
func (Token) Int ¶
Int returns the integer value of the token, also accepting float values and rouding them.
func (Token) IsNumber ¶
IsNumber returns `true` for integers and floats.
func (Token) IsOther ¶
IsOther return true if it has `Other` kind, with the given value
type Tokenizer ¶
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer is a PS/PDF tokenizer.
It handles PS features like Procs and CharStrings: strict parsers should check for such tokens and return an error if needed.
Comments are ignored.
The tokenizer can't handle streams and inline image data on it's own.
Regarding exponential numbers: 7.3.3 Numeric Objects: A conforming writer shall not use the PostScript syntax for numbers with non-decimal radices (such as 16#FFFE) or in exponential format (such as 6.02E23). Nonetheless, we sometimes get numbers with exponential format, so we support it in the tokenizer (no confusion with other types, so no compromise).
func NewTokenizer ¶
NewTokenizer returns a tokenizer working on the given input.
func NewTokenizerFromReader ¶
NewTokenizerFromReader supports tokenizing an input stream, without knowing its length. The tokenizer will call Read and buffer the data. The error from the io.Read method is discarded: the internal buffer is simply not grown. See `SetPosition`, `SkipBytes` and `Bytes` for more information of the behavior in this mode.
func (Tokenizer) Bytes ¶
Bytes return a slice of the bytes, starting from the current position. When using an io.Reader, only the current internal buffer is returned.
func (Tokenizer) CurrentPosition ¶
CurrentPosition return the position in the input. It may be used to go back if needed, using `SetPosition`.
func (Tokenizer) HasEOLBeforeToken ¶
HasEOLBeforeToken checks if EOL happens before the next token.
func (Tokenizer) IsEOF ¶
func (*Tokenizer) NextToken ¶
NextToken reads a token and advances (consuming the token). If EOF is reached, no error is returned, but an `EOF` token.
func (Tokenizer) PeekPeekToken ¶
PeekPeekToken reads the token after the next but does not advance the position. It returns a cached value, meaning it is a very cheap call.
func (Tokenizer) PeekToken ¶
PeekToken reads a token but does not advance the position. It returns a cached value, meaning it is a very cheap call. If the error is not nil, the return Token is garranteed to be zero.
func (*Tokenizer) Reset ¶
Reset allow to re-use the internal buffers allocated by the tokenizer.
func (*Tokenizer) ResetFromReader ¶
ResetFromReader allow to re-use the internal buffers allocated by the tokenizer.
func (*Tokenizer) SetPosition ¶
SetPosition set the position of the tokenizer in the input data.
Most of the time, `NextToken` should be preferred, but this method may be used for example to go back to a saved position.
When using an io.Reader as source, no additional buffering is performed.
func (*Tokenizer) SkipBytes ¶
SkipBytes skips the next `n` bytes and return them. This method is useful to handle inline data. If `n` is too large, it will be truncated: no additional buffering is done.
func (*Tokenizer) StreamPosition ¶
StreamPosition returns the position of the begining of a stream, taking into account white spaces. See 7.3.8.1 - General
Source Files ¶
- Version
- v1.0.1 (latest)
- Published
- Nov 7, 2021
- Platform
- linux/amd64
- Imports
- 5 packages
- Last checked
- 1 month ago –
Tools for package owners.