package topk
import "tailscale.com/util/topk"
Package topk defines a count-min sketch and a cheap probabilistic top-K data structure that uses the count-min sketch to track the top K items in constant memory and O(log(k)) time.
Index ¶
- func PickParams(err, probability float64) (hashes, buckets int)
- type CountMinSketch
- func NewCountMinSketch(hashes, buckets int) *CountMinSketch
- func (cms *CountMinSketch) Add(val []byte) uint64
- func (cms *CountMinSketch) AddN(val []byte, count uint64) uint64
- func (cms *CountMinSketch) Get(val []byte) uint64
- type SerializeFunc
- type TopK
Functions ¶
func PickParams ¶
PickParams provides good parameters for 'hashes' and 'buckets' when constructing a CountMinSketch, given an estimated total number of counts (i.e. the sum of all counts ever stored), the error factor ϵ as a float (e.g. 1% is 0.001), and the probability factor δ.
Parameters are chosen such that with a probability of 1−δ, the error is at most ϵ∗totalCount. Or, in other words: if N is the true count of an event, E is the estimate given by a sketch and T the total count of items in the sketch, E ≤ N + T*ϵ with probability (1 - δ).
Types ¶
type CountMinSketch ¶
type CountMinSketch struct {
// contains filtered or unexported fields
}
CountMinSketch implements a count-min sketch, a probabilistic data structure that tracks the frequency of events in a stream of data.
See: https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch
func NewCountMinSketch ¶
func NewCountMinSketch(hashes, buckets int) *CountMinSketch
NewCountMinSketch creates a new CountMinSketch with the provided number of hashes and buckets. Hashes and buckets are often called "depth" and "width", or "d" and "w", respectively.
func (*CountMinSketch) Add ¶
func (cms *CountMinSketch) Add(val []byte) uint64
Add calls AddN(val, 1).
func (*CountMinSketch) AddN ¶
func (cms *CountMinSketch) AddN(val []byte, count uint64) uint64
AddN increments the count for the given value by the provided count, returning the new count.
func (*CountMinSketch) Get ¶
func (cms *CountMinSketch) Get(val []byte) uint64
Get returns the count for the provided value.
type SerializeFunc ¶
HashFunc is responsible for providing a []byte serialization of a value, appended to the provided byte slice. This is used for hashing the value when adding to a CountMinSketch.
type TopK ¶
type TopK[T any] struct { // contains filtered or unexported fields }
TopK is a probabilistic counter of the top K items, using a count-min sketch to keep track of item counts and a heap to track the top K of them.
func New ¶
func New[T any](k int, sf SerializeFunc[T]) *TopK[T]
New creates a new TopK that stores k values. Parameters for the underlying count-min sketch are chosen for a 0.1% error rate and a 0.1% probability of error.
func NewWithParams ¶
func NewWithParams[T any](k int, sf SerializeFunc[T], numHashes, numCols int) *TopK[T]
NewWithParams creates a new TopK that stores k values, and additionally allows customizing the parameters for the underlying count-min sketch.
func (*TopK[T]) Add ¶
Add calls AddN(val, 1).
func (*TopK[T]) AddN ¶
AddN adds the given item to the set with the provided count, returning the new estimated count.
func (*TopK[T]) AppendTop ¶
func (tk *TopK[T]) AppendTop(sl []T) []T
AppendTop appends the estimated top K items as stored by this TopK to the provided slice, allocating only if the slice does not have enough capacity to store all items. The provided slice can be nil.
func (*TopK[T]) Top ¶
func (tk *TopK[T]) Top() []T
Top returns the estimated top K items as stored by this TopK.
Source Files ¶
topk.go
- Version
- v1.84.1 (latest)
- Published
- May 29, 2025
- Platform
- linux/amd64
- Imports
- 5 packages
- Last checked
- 1 day ago –
Tools for package owners.