package classifier
import "git.sr.ht/~shulhan/pakakeh.go/lib/mining/classifier"
Package classifier provides machine learning classifier library, including CART, Random Forest, Cascaded Random Forest, and KNN.
Index ¶
- func ComputeAccuracies(tp, fp, tn, fn []int64) (accuracies []float64)
- func ComputeElapsedTimes(start, end []int64) (elaps []int64)
- func ComputeFMeasures(precisions, recalls []float64) (fmeasures []float64)
- type CM
- func (cm *CM) ComputeNumeric(vs, actuals, predictions []int64)
- func (cm *CM) ComputeStrings(valueSpace, targets, predictions []string)
- func (cm *CM) FN() int
- func (cm *CM) FNIndices() []int
- func (cm *CM) FP() int
- func (cm *CM) FPIndices() []int
- func (cm *CM) GetColumnClassError() *tabula.Column
- func (cm *CM) GetFalseRate() float64
- func (cm *CM) GetTrueRate() float64
- func (cm *CM) GroupIndexPredictions(sampleListID []int, actuals, predictions []int64, )
- func (cm *CM) GroupIndexPredictionsStrings(sampleListID []int, actuals, predictions []string, )
- func (cm *CM) String() (s string)
- func (cm *CM) TN() int
- func (cm *CM) TNIndices() []int
- func (cm *CM) TP() int
- func (cm *CM) TPIndices() []int
- type Runtime
- func (rt *Runtime) AddOOBCM(cm *CM)
- func (rt *Runtime) AddStat(stat *Stat)
- func (rt *Runtime) CloseOOBStatsFile() (e error)
- func (rt *Runtime) ComputeCM(sampleListID []int, vs, actuals, predicts []string, ) ( cm *CM, )
- func (rt *Runtime) ComputeStatFromCM(stat *Stat, cm *CM)
- func (rt *Runtime) ComputeStatTotal(stat *Stat)
- func (rt *Runtime) Finalize() (e error)
- func (rt *Runtime) Initialize() error
- func (rt *Runtime) OOBStats() *Stats
- func (rt *Runtime) OpenOOBStatsFile() error
- func (rt *Runtime) Performance(samples tabula.ClasetInterface, predicts []string, probs []float64, ) ( perfs Stats, )
- func (rt *Runtime) PrintOobStat(stat *Stat, cm *CM)
- func (rt *Runtime) PrintStat(stat *Stat)
- func (rt *Runtime) PrintStatTotal(st *Stat)
- func (rt *Runtime) StatTotal() *Stat
- func (rt *Runtime) WriteOOBStat(stat *Stat) error
- func (rt *Runtime) WritePerformance() error
- type Stat
- func (stat *Stat) End()
- func (stat *Stat) Recall() float64
- func (stat *Stat) SetAUC(v float64)
- func (stat *Stat) SetFPRate(fp, n int64)
- func (stat *Stat) SetPrecisionFromRate(p, n int64)
- func (stat *Stat) SetTPRate(tp, p int64)
- func (stat *Stat) Start()
- func (stat *Stat) Sum(other *Stat)
- func (stat *Stat) ToRow() (row *tabula.Row)
- func (stat *Stat) Write(file string) (e error)
- type Stats
- func (stats *Stats) Accuracies() (accuracies []float64)
- func (stats *Stats) Add(stat *Stat)
- func (stats *Stats) EndTimes() (times []int64)
- func (stats *Stats) FMeasures() (fmeasures []float64)
- func (stats *Stats) FPRates() (fprates []float64)
- func (stats *Stats) OobErrorMeans() (oobmeans []float64)
- func (stats *Stats) Precisions() (precs []float64)
- func (stats *Stats) Recalls() (recalls []float64)
- func (stats *Stats) StartTimes() (times []int64)
- func (stats *Stats) TNRates() (tnrates []float64)
- func (stats *Stats) TPRates() (tprates []float64)
- func (stats *Stats) Write(file string) (e error)
Functions ¶
func ComputeAccuracies ¶
ComputeAccuracies will compute and return accuracy from array of true-positive, false-positive, true-negative, and false-negative; using formula,
(tp + tn) / (tp + tn + tn + fn)
func ComputeElapsedTimes ¶
ComputeElapsedTimes will compute and return elapsed time between `start` and `end` timestamps.
func ComputeFMeasures ¶
ComputeFMeasures given array of precisions and recalls, compute F-measure of each instance and return it.
Types ¶
type CM ¶
CM represent the matrix of classification.
func (*CM) ComputeNumeric ¶
ComputeNumeric will calculate confusion matrix using targets and predictions values.
func (*CM) ComputeStrings ¶
ComputeStrings will calculate confusion matrix using targets and predictions class values.
func (*CM) FN ¶
FN return number of false-negative.
func (*CM) FNIndices ¶
FNIndices return indices of all false-negative samples.
func (*CM) FP ¶
FP return number of false-positive in confusion matrix.
func (*CM) FPIndices ¶
FPIndices return indices of all false-positive samples.
func (*CM) GetColumnClassError ¶
GetColumnClassError return the last column which is the column that contain the error of classification.
func (*CM) GetFalseRate ¶
GetFalseRate return false-positive rate in term of,
false-positive / (false-positive + true negative)
func (*CM) GetTrueRate ¶
GetTrueRate return true-positive rate in term of
true-positive / (true-positive + false-positive)
func (*CM) GroupIndexPredictions ¶
GroupIndexPredictions given index of samples, group the samples by their class of prediction. For example,
sampleListID: [0, 1, 2, 3, 4, 5] actuals: [1, 1, 0, 0, 1, 0] predictions: [1, 0, 1, 0, 1, 1]
This function will group the index by true-positive, false-positive, true-negative, and false-negative, which result in,
true-positive indices: [0, 4] false-positive indices: [2, 5] true-negative indices: [3] false-negative indices: [1]
This function assume that positive value as "1" and negative value as "0".
func (*CM) GroupIndexPredictionsStrings ¶
GroupIndexPredictionsStrings is an alternative to GroupIndexPredictions which work with string class.
func (*CM) String ¶
String will return the output of confusion matrix in table like format.
func (*CM) TN ¶
TN return number of true-negative.
func (*CM) TNIndices ¶
TNIndices return indices of all true-negative samples.
func (*CM) TP ¶
TP return number of true-positive in confusion matrix.
func (*CM) TPIndices ¶
TPIndices return indices of all true-positive samples.
type Runtime ¶
type Runtime struct { // OOBStatsFile is the file where OOB statistic will be written. OOBStatsFile string `json:"OOBStatsFile"` // PerfFile is the file where statistic of performance will be written. PerfFile string `json:"PerfFile"` // StatFile is the file where statistic of classifying samples will be // written. StatFile string `json:"StatFile"` // RunOOB if its true the OOB will be computed, default is false. RunOOB bool `json:"RunOOB"` // contains filtered or unexported fields }
Runtime define a generic type which provide common fields that can be embedded by the real classifier (e.g. RandomForest).
func (*Runtime) AddOOBCM ¶
AddOOBCM will append new confusion matrix.
func (*Runtime) AddStat ¶
AddStat will append new classifier statistic data.
func (*Runtime) CloseOOBStatsFile ¶
CloseOOBStatsFile will close statistics file for writing.
func (*Runtime) ComputeCM ¶
ComputeCM will compute confusion matrix of sample using value space, actual and prediction values.
func (*Runtime) ComputeStatFromCM ¶
ComputeStatFromCM will compute statistic using confusion matrix.
func (*Runtime) ComputeStatTotal ¶
ComputeStatTotal compute total statistic.
func (*Runtime) Finalize ¶
Finalize finish the runtime, compute total statistic, write it to file, and close the file.
func (*Runtime) Initialize ¶
Initialize will start the runtime for processing by saving start time and opening stats file.
func (*Runtime) OOBStats ¶
OOBStats return all statistic objects.
func (*Runtime) OpenOOBStatsFile ¶
OpenOOBStatsFile will open statistic file for output.
func (*Runtime) Performance ¶
func (rt *Runtime) Performance(samples tabula.ClasetInterface, predicts []string, probs []float64, ) ( perfs Stats, )
Performance given an actuals class label and their probabilities, compute the performance statistic of classifier.
Algorithm, (1) Sort the probabilities in descending order. (2) Sort the actuals and predicts using sorted index from probs (3) Compute tpr, fpr, precision (4) Write performance to file.
func (*Runtime) PrintOobStat ¶
PrintOobStat will print the out-of-bag statistic to standard output.
func (*Runtime) PrintStat ¶
PrintStat will print statistic value to standard output.
func (*Runtime) PrintStatTotal ¶
PrintStatTotal will print total statistic to standard output.
func (*Runtime) StatTotal ¶
StatTotal return total statistic.
func (*Runtime) WriteOOBStat ¶
WriteOOBStat will write statistic of process to file.
func (*Runtime) WritePerformance ¶
WritePerformance will write performance data to file.
type Stat ¶
type Stat struct { // ID unique id for this statistic (e.g. number of tree). ID int64 // StartTime contain the start time of classifier in unix timestamp. StartTime int64 // EndTime contain the end time of classifier in unix timestamp. EndTime int64 // ElapsedTime contain actual time, in seconds, between end and start // time. ElapsedTime int64 // TP contain true-positive value. TP int64 // FP contain false-positive value. FP int64 // TN contain true-negative value. TN int64 // FN contain false-negative value. FN int64 // OobError contain out-of-bag error. OobError float64 // OobErrorMean contain mean of out-of-bag error. OobErrorMean float64 // TPRate contain true-positive rate (recall): tp/(tp+fn) TPRate float64 // FPRate contain false-positive rate: fp/(fp+tn) FPRate float64 // TNRate contain true-negative rate: tn/(tn+fp) TNRate float64 // Precision contain: tp/(tp+fp) Precision float64 // FMeasure contain value of F-measure or the harmonic mean of // precision and recall. FMeasure float64 // Accuracy contain the degree of closeness of measurements of a // quantity to that quantity's true value. Accuracy float64 // AUC contain the area under curve. AUC float64 }
Stat hold statistic value of classifier, including TP rate, FP rate, precision, and recall.
func (*Stat) End ¶
func (stat *Stat) End()
End will stop the timer and compute the elapsed time.
func (*Stat) Recall ¶
Recall return value of recall.
func (*Stat) SetAUC ¶
SetAUC will set the AUC value.
func (*Stat) SetFPRate ¶
SetFPRate will set FP and FPRate using number of negative `n`.
func (*Stat) SetPrecisionFromRate ¶
SetPrecisionFromRate will set Precision value using tprate and fprate. `p` and `n` is the number of positive and negative class in samples.
func (*Stat) SetTPRate ¶
SetTPRate will set TP and TPRate using number of positive `p`.
func (*Stat) Start ¶
func (stat *Stat) Start()
Start will start the timer.
func (*Stat) Sum ¶
Sum will add statistic from other stat object to current stat, not including the start and end time.
func (*Stat) ToRow ¶
ToRow will convert the stat to tabula.row in the order of Stat field.
func (*Stat) Write ¶
Write will write the content of stat to `file`.
type Stats ¶
type Stats []*Stat
Stats define list of statistic values.
func (*Stats) Accuracies ¶
Accuracies return all accuracy values.
func (*Stats) Add ¶
Add will add other stat object to the slice.
func (*Stats) EndTimes ¶
EndTimes return all end times in unix timestamp.
func (*Stats) FMeasures ¶
FMeasures return all F-measure values.
func (*Stats) FPRates ¶
FPRates return all false-positive rate values.
func (*Stats) OobErrorMeans ¶
OobErrorMeans return all out-of-bag error mean values.
func (*Stats) Precisions ¶
Precisions return all precision values.
func (*Stats) Recalls ¶
Recalls return all recall values.
func (*Stats) StartTimes ¶
StartTimes return all start times in unix timestamp.
func (*Stats) TNRates ¶
TNRates will return all true-negative rate values.
func (*Stats) TPRates ¶
TPRates return all true-positive rate values.
func (*Stats) Write ¶
Write will write all statistic data to `file`.
Source Files ¶
classifier.go cm.go runtime.go stat.go stats.go stats_interface.go
Directories ¶
Path | Synopsis |
---|---|
lib/mining/classifier/cart | Package cart implement the Classification and Regression Tree by Breiman, et al. |
lib/mining/classifier/crf | Package crf implement the cascaded random forest algorithm, proposed by Baumann et.al in their paper: |
lib/mining/classifier/rf | Package rf implement ensemble of classifiers using random forest algorithm by Breiman and Cutler. |
- Version
- v0.60.0 (latest)
- Published
- Feb 1, 2025
- Platform
- linux/amd64
- Imports
- 9 packages
- Last checked
- 10 minutes ago –
Tools for package owners.