package textseg

import "github.com/apparentlymart/go-textseg/v15/textseg"

Index

Variables

var Error = errors.New("invalid UTF8 text")

Functions

func AllTokens

func AllTokens(buf []byte, splitFunc bufio.SplitFunc) ([][]byte, error)

AllTokens is a utility that uses a bufio.SplitFunc to produce a slice of all of the recognized tokens in the given buffer.

func ScanGraphemeClusters

func ScanGraphemeClusters(data []byte, atEOF bool) (int, []byte, error)

ScanGraphemeClusters is a split function for bufio.Scanner that splits on grapheme cluster boundaries.

func ScanUTF8Sequences

func ScanUTF8Sequences(data []byte, atEOF bool) (int, []byte, error)

ScanGraphemeClusters is a split function for bufio.Scanner that splits on UTF8 sequence boundaries.

This is included largely for completeness, since this behavior is already built in to Go when ranging over a string.

func TokenCount

func TokenCount(buf []byte, splitFunc bufio.SplitFunc) (int, error)

TokenCount is a utility that uses a bufio.SplitFunc to count the number of recognized tokens in the given buffer.

Source Files

all_tokens.go generate.go grapheme_clusters.go tables.go utf8_seqs.go

Version
v15.0.0 (latest)
Published
Aug 29, 2023
Platform
linux/amd64
Imports
5 packages
Last checked
3 weeks ago

Tools for package owners.