pstokenizer – github.com/benoitkugler/pstokenizer Index | Files

package tokenizer

import "github.com/benoitkugler/pstokenizer"

Implements the lowest level of processing of PS/PDF files. The tokenizer is also usable with Type1 font files. See the higher level package pdf/parser to read PDF objects.

Index

Functions

func IsAsciiWhitespace

func IsAsciiWhitespace(ch byte) bool

IsAsciiWhitespace returns true if `ch` is one of the ASCII whitespaces.

func IsHexChar

func IsHexChar(c byte) (uint8, bool)

IsHexChar converts a hex character into its value and a success flag (see encoding/hex for details).

Types

type Fl

type Fl = float64

type Kind

type Kind uint8

Kind is the kind of token.

const (
	EOF Kind
	Float
	Integer
	String
	StringHex
	Name
	StartArray
	EndArray
	StartDic
	EndDic
	Other // include commands in content stream

	StartProc  // only valid in PostScript files
	EndProc    // idem
	CharString // PS only: binary stream, introduce by and integer and a RD or -| command
)

func (Kind) String

func (k Kind) String() string

type Token

type Token struct {
	// Additional value found in the data
	// Note that it is a copy of the source bytes.
	Value []byte
	Kind  Kind
}

Token represents a basic piece of information. `Value` must be interpreted according to `Kind`, which is left to parsing packages.

func Tokenize

func Tokenize(data []byte) ([]Token, error)

Tokenize consume all the input, splitting it into tokens. When performance matters, you should use the iteration method `NextToken` of the Tokenizer type.

func (Token) Float

func (t Token) Float() (Fl, error)

Float returns the float value of the token.

func (Token) Int

func (t Token) Int() (int, error)

Int returns the integer value of the token, also accepting float values and rouding them.

func (Token) IsNumber

func (t Token) IsNumber() bool

IsNumber returns `true` for integers and floats.

func (Token) IsOther

func (t Token) IsOther(value string) bool

IsOther return true if it has `Other` kind, with the given value

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer is a PS/PDF tokenizer.

It handles PS features like Procs and CharStrings: strict parsers should check for such tokens and return an error if needed.

Comments are ignored.

The tokenizer can't handle streams and inline image data on it's own.

Regarding exponential numbers: 7.3.3 Numeric Objects: A conforming writer shall not use the PostScript syntax for numbers with non-decimal radices (such as 16#FFFE) or in exponential format (such as 6.02E23). Nonetheless, we sometimes get numbers with exponential format, so we support it in the tokenizer (no confusion with other types, so no compromise).

func NewTokenizer

func NewTokenizer(data []byte) *Tokenizer

NewTokenizer returns a tokenizer working on the given input.

func NewTokenizerFromReader

func NewTokenizerFromReader(src io.Reader) *Tokenizer

NewTokenizerFromReader supports tokenizing an input stream, without knowing its length. The tokenizer will call Read and buffer the data. The error from the io.Read method is discarded: the internal buffer is simply not grown. See `SetPosition`, `SkipBytes` and `Bytes` for more information of the behavior in this mode.

func (Tokenizer) Bytes

func (pr Tokenizer) Bytes() []byte

Bytes return a slice of the bytes, starting from the current position. When using an io.Reader, only the current internal buffer is returned.

func (Tokenizer) CurrentPosition

func (pr Tokenizer) CurrentPosition() int

CurrentPosition return the position in the input. It may be used to go back if needed, using `SetPosition`.

func (Tokenizer) HasEOLBeforeToken

func (pr Tokenizer) HasEOLBeforeToken() bool

HasEOLBeforeToken checks if EOL happens before the next token.

func (Tokenizer) IsEOF

func (pr Tokenizer) IsEOF() bool

func (*Tokenizer) NextToken

func (pr *Tokenizer) NextToken() (Token, error)

NextToken reads a token and advances (consuming the token). If EOF is reached, no error is returned, but an `EOF` token.

func (Tokenizer) PeekPeekToken

func (pr Tokenizer) PeekPeekToken() (Token, error)

PeekPeekToken reads the token after the next but does not advance the position. It returns a cached value, meaning it is a very cheap call.

func (Tokenizer) PeekToken

func (pr Tokenizer) PeekToken() (Token, error)

PeekToken reads a token but does not advance the position. It returns a cached value, meaning it is a very cheap call. If the error is not nil, the return Token is garranteed to be zero.

func (*Tokenizer) Reset

func (tk *Tokenizer) Reset(data []byte)

Reset allow to re-use the internal buffers allocated by the tokenizer.

func (*Tokenizer) ResetFromReader

func (tk *Tokenizer) ResetFromReader(src io.Reader)

ResetFromReader allow to re-use the internal buffers allocated by the tokenizer.

func (*Tokenizer) SetPosition

func (tk *Tokenizer) SetPosition(pos int)

SetPosition set the position of the tokenizer in the input data.

Most of the time, `NextToken` should be preferred, but this method may be used for example to go back to a saved position.

When using an io.Reader as source, no additional buffering is performed.

func (*Tokenizer) SkipBytes

func (pr *Tokenizer) SkipBytes(n int) []byte

SkipBytes skips the next `n` bytes and return them. This method is useful to handle inline data. If `n` is too large, it will be truncated: no additional buffering is done.

func (*Tokenizer) StreamPosition

func (pr *Tokenizer) StreamPosition() int

StreamPosition returns the position of the begining of a stream, taking into account white spaces. See 7.3.8.1 - General

Source Files

token.go

Version
v1.0.1 (latest)
Published
Nov 7, 2021
Platform
linux/amd64
Imports
5 packages
Last checked
1 month ago

Tools for package owners.