package stat

import "gonum.org/v1/gonum/stat"

Package stat provides generalized statistical functions.

Index

Examples

Functions

func Bhattacharyya

func Bhattacharyya(p, q []float64) float64

Bhattacharyya computes the distance between the probability distributions p and q given by:

-\ln ( \sum_i \sqrt{p_i q_i} )

The lengths of p and q must be equal. It is assumed that p and q sum to 1.

func BivariateMoment

func BivariateMoment(r, s float64, x, y, weights []float64) float64

BivariateMoment computes the weighted mixed moment between the samples x and y.

E[(x - μ_x)^r*(y - μ_y)^s]

No degrees of freedom correction is done. The lengths of x and y must be equal. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

func CDF

func CDF(q float64, c CumulantKind, x, weights []float64) float64

CDF returns the empirical cumulative distribution function value of x, that is the fraction of the samples less than or equal to q. The exact behavior is determined by the CumulantKind. CDF is theoretically the inverse of the Quantile function, though it may not be the actual inverse for all values q and CumulantKinds.

The x data must be sorted in increasing order. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights). CDF will panic if the length of x is zero.

CumulantKind behaviors:

func ChiSquare

func ChiSquare(obs, exp []float64) float64

ChiSquare computes the chi-square distance between the observed frequencies 'obs' and expected frequencies 'exp' given by:

\sum_i (obs_i-exp_i)^2 / exp_i

The lengths of obs and exp must be equal.

func CircularMean

func CircularMean(x, weights []float64) float64

CircularMean returns the circular mean of the dataset.

atan2(\sum_i w_i * sin(alpha_i), \sum_i w_i * cos(alpha_i))

If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	x := []float64{0, 0.25 * math.Pi, 0.75 * math.Pi}
	weights := []float64{1, 2, 2.5}
	cmean := CircularMean(x, weights)

	fmt.Printf("The circular mean is %.5f.\n", cmean)
	// Output:
	// The circular mean is 1.37037.
}

Output:

The circular mean is 1.37037.

func Correlation

func Correlation(x, y, weights []float64) float64

Correlation returns the weighted correlation between the samples of x and y with the given means.

sum_i {w_i (x_i - meanX) * (y_i - meanY)} / (stdX * stdY)

The lengths of x and y must be equal. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	x := []float64{8, -3, 7, 8, -4}
	y := []float64{10, 5, 6, 3, -1}
	w := []float64{2, 1.5, 3, 3, 2}

	fmt.Println("Correlation computes the degree to which two datasets move together")
	fmt.Println("about their mean. For example, x and y above move similarly.")

	c := Correlation(x, y, w)
	fmt.Printf("Correlation is %.5f\n", c)

	// Output:
	// Correlation computes the degree to which two datasets move together
	// about their mean. For example, x and y above move similarly.
	// Correlation is 0.59915
}

Output:

Correlation computes the degree to which two datasets move together
about their mean. For example, x and y above move similarly.
Correlation is 0.59915

func CorrelationMatrix

func CorrelationMatrix(dst *mat.SymDense, x mat.Matrix, weights []float64)

CorrelationMatrix returns the correlation matrix calculated from a matrix of data, x, using a two-pass algorithm. The result is stored in dst.

If weights is not nil the weighted correlation of x is calculated. weights must have length equal to the number of rows in input data matrix and must not contain negative elements. The dst matrix must either be empty or have the same number of columns as the input data matrix.

func Covariance

func Covariance(x, y, weights []float64) float64

Covariance returns the weighted covariance between the samples of x and y.

sum_i {w_i (x_i - meanX) * (y_i - meanY)} / (sum_j {w_j} - 1)

The lengths of x and y must be equal. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	fmt.Println("Covariance computes the degree to which datasets move together")
	fmt.Println("about their mean.")
	x := []float64{8, -3, 7, 8, -4}
	y := []float64{10, 2, 2, 4, 1}
	cov := Covariance(x, y, nil)
	fmt.Printf("Cov = %.4f\n", cov)
	fmt.Println("If datasets move perfectly together, the variance equals the covariance")
	y2 := []float64{12, 1, 11, 12, 0}
	cov2 := Covariance(x, y2, nil)
	varX := Variance(x, nil)
	fmt.Printf("Cov2 is %.4f, VarX is %.4f", cov2, varX)
	// Output:
	// Covariance computes the degree to which datasets move together
	// about their mean.
	// Cov = 13.8000
	// If datasets move perfectly together, the variance equals the covariance
	// Cov2 is 37.7000, VarX is 37.7000
}

Output:

Covariance computes the degree to which datasets move together
about their mean.
Cov = 13.8000
If datasets move perfectly together, the variance equals the covariance
Cov2 is 37.7000, VarX is 37.7000

func CovarianceMatrix

func CovarianceMatrix(dst *mat.SymDense, x mat.Matrix, weights []float64)

CovarianceMatrix calculates the covariance matrix (also known as the variance-covariance matrix) calculated from a matrix of data, x, using a two-pass algorithm. The result is stored in dst.

If weights is not nil the weighted covariance of x is calculated. weights must have length equal to the number of rows in input data matrix and must not contain negative elements. The dst matrix must either be empty or have the same number of columns as the input data matrix.

func CrossEntropy

func CrossEntropy(p, q []float64) float64

CrossEntropy computes the cross-entropy between the two distributions specified in p and q.

func Entropy

func Entropy(p []float64) float64

Entropy computes the Shannon entropy of a distribution or the distance between two distributions. The natural logarithm is used.

Example

Code:

{

	p := []float64{0.05, 0.1, 0.9, 0.05}
	entP := Entropy(p)

	q := []float64{0.2, 0.4, 0.25, 0.15}
	entQ := Entropy(q)

	r := []float64{0.2, 0, 0, 0.5, 0, 0.2, 0.1, 0, 0, 0}
	entR := Entropy(r)

	s := []float64{0, 0, 1, 0}
	entS := Entropy(s)

	fmt.Println("Entropy is a measure of the amount of uncertainty in a distribution")
	fmt.Printf("The second bin of p is very likely to occur. It's entropy is %.4f\n", entP)
	fmt.Printf("The distribution of q is more spread out. It's entropy is %.4f\n", entQ)
	fmt.Println("Adding buckets with zero probability does not change the entropy.")
	fmt.Printf("The entropy of r is: %.4f\n", entR)
	fmt.Printf("A distribution with no uncertainty has entropy %.4f\n", entS)
	// Output:
	// Entropy is a measure of the amount of uncertainty in a distribution
	// The second bin of p is very likely to occur. It's entropy is 0.6247
	// The distribution of q is more spread out. It's entropy is 1.3195
	// Adding buckets with zero probability does not change the entropy.
	// The entropy of r is: 1.2206
	// A distribution with no uncertainty has entropy 0.0000
}

Output:

Entropy is a measure of the amount of uncertainty in a distribution
The second bin of p is very likely to occur. It's entropy is 0.6247
The distribution of q is more spread out. It's entropy is 1.3195
Adding buckets with zero probability does not change the entropy.
The entropy of r is: 1.2206
A distribution with no uncertainty has entropy 0.0000

func ExKurtosis

func ExKurtosis(x, weights []float64) float64

ExKurtosis returns the population excess kurtosis of the sample. The kurtosis is defined by the 4th moment of the mean divided by the squared variance. The excess kurtosis subtracts 3.0 so that the excess kurtosis of the normal distribution is zero. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	fmt.Println(`Kurtosis is a measure of the 'peakedness' of a distribution, and the
excess kurtosis is the kurtosis above or below that of the standard normal
distribution`)
	x := []float64{5, 4, -3, -2}
	kurt := ExKurtosis(x, nil)
	fmt.Printf("ExKurtosis = %.5f\n", kurt)
	weights := []float64{1, 2, 3, 5}
	wKurt := ExKurtosis(x, weights)
	fmt.Printf("Weighted ExKurtosis is %.4f", wKurt)
	// Output:
	// Kurtosis is a measure of the 'peakedness' of a distribution, and the
	// excess kurtosis is the kurtosis above or below that of the standard normal
	// distribution
	// ExKurtosis = -5.41200
	// Weighted ExKurtosis is -0.6779
}

Output:

Kurtosis is a measure of the 'peakedness' of a distribution, and the
excess kurtosis is the kurtosis above or below that of the standard normal
distribution
ExKurtosis = -5.41200
Weighted ExKurtosis is -0.6779

func GeometricMean

func GeometricMean(x, weights []float64) float64

GeometricMean returns the weighted geometric mean of the dataset

\prod_i {x_i ^ w_i}

This only applies with positive x and positive weights. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	x := []float64{8, 2, 9, 15, 4}
	weights := []float64{2, 2, 6, 7, 1}
	mean := Mean(x, weights)
	gmean := GeometricMean(x, weights)

	logx := make([]float64, len(x))
	for i, v := range x {
		logx[i] = math.Log(v)
	}
	expMeanLog := math.Exp(Mean(logx, weights))
	fmt.Printf("The arithmetic mean is %.4f, but the geometric mean is %.4f.\n", mean, gmean)
	fmt.Printf("The exponential of the mean of the logs is %.4f\n", expMeanLog)
	// Output:
	// The arithmetic mean is 10.1667, but the geometric mean is 8.7637.
	// The exponential of the mean of the logs is 8.7637
}

Output:

The arithmetic mean is 10.1667, but the geometric mean is 8.7637.
The exponential of the mean of the logs is 8.7637

func HarmonicMean

func HarmonicMean(x, weights []float64) float64

HarmonicMean returns the weighted harmonic mean of the dataset

\sum_i {w_i} / ( sum_i {w_i / x_i} )

This only applies with positive x and positive weights. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	x := []float64{8, 2, 9, 15, 4}
	weights := []float64{2, 2, 6, 7, 1}
	mean := Mean(x, weights)
	hmean := HarmonicMean(x, weights)

	fmt.Printf("The arithmetic mean is %.5f, but the harmonic mean is %.4f.\n", mean, hmean)
	// Output:
	// The arithmetic mean is 10.16667, but the harmonic mean is 6.8354.
}

Output:

The arithmetic mean is 10.16667, but the harmonic mean is 6.8354.

func Hellinger

func Hellinger(p, q []float64) float64

Hellinger computes the distance between the probability distributions p and q given by:

\sqrt{ 1 - \sum_i \sqrt{p_i q_i} }

The lengths of p and q must be equal. It is assumed that p and q sum to 1.

func Histogram

func Histogram(count, dividers, x, weights []float64) []float64

Histogram sums up the weighted number of data points in each bin. The weight of data point x[i] will be placed into count[j] if dividers[j] <= x < dividers[j+1]. The "span" function in the floats package can assist with bin creation.

The following conditions on the inputs apply:

Example

Code:

{
	x := make([]float64, 101)
	for i := range x {
		x[i] = 1.1 * float64(i) // x data ranges from 0 to 110
	}
	dividers := []float64{0, 7, 20, 100, 1000}
	fmt.Println(`Histogram counts the amount of data in the bins specified by
the dividers. In this data set, there are 7 data points less than 7 (between dividers[0]
and dividers[1]), 12 data points between 7 and 20 (dividers[1] and dividers[2]),
and 0 data points above 1000. Since dividers has length 5, there will be 4 bins.`)
	hist := Histogram(nil, dividers, x, nil)
	fmt.Printf("Hist = %v\n", hist)

	fmt.Println()
	fmt.Println("For ease, the floats Span function can be used to set the dividers")
	nBins := 10
	dividers = make([]float64, nBins+1)
	min := floats.Min(x)
	max := floats.Max(x)
	// Increase the maximum divider so that the maximum value of x is contained
	// within the last bucket.
	max++
	floats.Span(dividers, min, max)
	// Span includes the min and the max. Trim the dividers to create 10 buckets
	hist = Histogram(nil, dividers, x, nil)
	fmt.Printf("Hist = %v\n", hist)
	fmt.Println()
	fmt.Println(`Histogram also works with weighted data, and allows reusing of
the count field in order to avoid extra garbage`)
	weights := make([]float64, len(x))
	for i := range weights {
		weights[i] = float64(i + 1)
	}
	Histogram(hist, dividers, x, weights)
	fmt.Printf("Weighted Hist = %v\n", hist)

	// Output:
	// Histogram counts the amount of data in the bins specified by
	// the dividers. In this data set, there are 7 data points less than 7 (between dividers[0]
	// and dividers[1]), 12 data points between 7 and 20 (dividers[1] and dividers[2]),
	// and 0 data points above 1000. Since dividers has length 5, there will be 4 bins.
	// Hist = [7 12 72 10]
	//
	// For ease, the floats Span function can be used to set the dividers
	// Hist = [11 10 10 10 10 10 10 10 10 10]
	//
	// Histogram also works with weighted data, and allows reusing of
	// the count field in order to avoid extra garbage
	// Weighted Hist = [66 165 265 365 465 565 665 765 865 965]
}

Output:

Histogram counts the amount of data in the bins specified by
the dividers. In this data set, there are 7 data points less than 7 (between dividers[0]
and dividers[1]), 12 data points between 7 and 20 (dividers[1] and dividers[2]),
and 0 data points above 1000. Since dividers has length 5, there will be 4 bins.
Hist = [7 12 72 10]

For ease, the floats Span function can be used to set the dividers
Hist = [11 10 10 10 10 10 10 10 10 10]

Histogram also works with weighted data, and allows reusing of
the count field in order to avoid extra garbage
Weighted Hist = [66 165 265 365 465 565 665 765 865 965]

func JensenShannon

func JensenShannon(p, q []float64) float64

JensenShannon computes the JensenShannon divergence between the distributions p and q. The Jensen-Shannon divergence is defined as

m = 0.5 * (p + q)
JS(p, q) = 0.5 ( KL(p, m) + KL(q, m) )

Unlike Kullback-Leibler, the Jensen-Shannon distance is symmetric. The value is between 0 and ln(2).

func Kendall

func Kendall(x, y, weights []float64) float64

Kendall returns the weighted Tau-a Kendall correlation between the samples of x and y. The Kendall correlation measures the quantity of concordant and discordant pairs of numbers. If weights are specified then each pair is weighted by weights[i] * weights[j] and the final sum is normalized to stay between -1 and 1. The lengths of x and y must be equal. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	x := []float64{8, -3, 7, 8, -4}
	y := []float64{10, 5, 6, 3, -1}
	w := []float64{2, 1.5, 3, 3, 2}

	fmt.Println("Kendall correlation computes the number of ordered pairs")
	fmt.Println("between two datasets.")

	c := Kendall(x, y, w)
	fmt.Printf("Kendall correlation is %.5f\n", c)

	// Output:
	// Kendall correlation computes the number of ordered pairs
	// between two datasets.
	// Kendall correlation is 0.25000
}

Output:

Kendall correlation computes the number of ordered pairs
between two datasets.
Kendall correlation is 0.25000

func KolmogorovSmirnov

func KolmogorovSmirnov(x, xWeights, y, yWeights []float64) float64

KolmogorovSmirnov computes the largest distance between two empirical CDFs. Each dataset x and y consists of sample locations and counts, xWeights and yWeights, respectively.

x and y may have different lengths, though len(x) must equal len(xWeights), and len(y) must equal len(yWeights). Both x and y must be sorted.

Special cases are:

= 0 if len(x) == len(y) == 0
= 1 if len(x) == 0, len(y) != 0 or len(x) != 0 and len(y) == 0

func KullbackLeibler

func KullbackLeibler(p, q []float64) float64

KullbackLeibler computes the Kullback-Leibler distance between the distributions p and q. The natural logarithm is used.

sum_i(p_i * log(p_i / q_i))

Note that the Kullback-Leibler distance is not symmetric; KullbackLeibler(p,q) != KullbackLeibler(q,p)

Example

Code:

{

	p := []float64{0.05, 0.1, 0.9, 0.05}
	q := []float64{0.2, 0.4, 0.25, 0.15}
	s := []float64{0, 0, 1, 0}

	klPQ := KullbackLeibler(p, q)
	klPS := KullbackLeibler(p, s)
	klPP := KullbackLeibler(p, p)

	fmt.Println("Kullback-Leibler is one measure of the difference between two distributions")
	fmt.Printf("The K-L distance between p and q is %.4f\n", klPQ)
	fmt.Println("It is impossible for s and p to be the same distribution, because")
	fmt.Println("the first bucket has zero probability in s and non-zero in p. Thus,")
	fmt.Printf("the K-L distance between them is %.4f\n", klPS)
	fmt.Printf("The K-L distance between identical distributions is %.4f\n", klPP)

	// Kullback-Leibler is one measure of the difference between two distributions
	// The K-L distance between p and q is 0.8900
	// It is impossible for s and p to be the same distribution, because
	// the first bucket has zero probability in s and non-zero in p. Thus,
	// the K-L distance between them is +Inf
	// The K-L distance between identical distributions is 0.0000
}

func LinearRegression

func LinearRegression(x, y, weights []float64, origin bool) (alpha, beta float64)

LinearRegression computes the best-fit line

y = alpha + beta*x

to the data in x and y with the given weights. If origin is true, the regression is forced to pass through the origin.

Specifically, LinearRegression computes the values of alpha and beta such that the total residual

\sum_i w[i]*(y[i] - alpha - beta*x[i])^2

is minimized. If origin is true, then alpha is forced to be zero.

The lengths of x and y must be equal. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:play 

package main

import (
	"fmt"

	"golang.org/x/exp/rand"

	"gonum.org/v1/gonum/stat"
)

func main() {
	var (
		xs      = make([]float64, 100)
		ys      = make([]float64, 100)
		weights []float64
	)

	line := func(x float64) float64 {
		return 1 + 3*x
	}

	for i := range xs {
		xs[i] = float64(i)
		ys[i] = line(xs[i]) + 0.1*rand.NormFloat64()
	}

	// Do not force the regression line to pass through the origin.
	origin := false

	alpha, beta := stat.LinearRegression(xs, ys, weights, origin)
	r2 := stat.RSquared(xs, ys, weights, alpha, beta)

	fmt.Printf("Estimated offset is: %.6f\n", alpha)
	fmt.Printf("Estimated slope is:  %.6f\n", beta)
	fmt.Printf("R^2: %.6f\n", r2)

}

Output:

Estimated offset is: 0.988572
Estimated slope is:  3.000154
R^2: 0.999999

func Mahalanobis

func Mahalanobis(x, y mat.Vector, chol *mat.Cholesky) float64

Mahalanobis computes the Mahalanobis distance

D = sqrt((x-y)ᵀ * Σ^-1 * (x-y))

between the column vectors x and y given the cholesky decomposition of Σ. Mahalanobis returns NaN if the linear solve fails.

See https://en.wikipedia.org/wiki/Mahalanobis_distance for more information.

func Mean

func Mean(x, weights []float64) float64

Mean computes the weighted mean of the data set.

sum_i {w_i * x_i} / sum_i {w_i}

If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	x := []float64{8.2, -6, 5, 7}
	mean := Mean(x, nil)
	fmt.Printf("The mean of the samples is %.4f\n", mean)
	w := []float64{2, 6, 3, 5}
	weightedMean := Mean(x, w)
	fmt.Printf("The weighted mean of the samples is %.4f\n", weightedMean)
	x2 := []float64{8.2, 8.2, -6, -6, -6, -6, -6, -6, 5, 5, 5, 7, 7, 7, 7, 7}
	mean2 := Mean(x2, nil)
	fmt.Printf("The mean of x2 is %.4f\n", mean2)
	fmt.Println("The weights act as if there were more samples of that number")
	// Output:
	// The mean of the samples is 3.5500
	// The weighted mean of the samples is 1.9000
	// The mean of x2 is 1.9000
	// The weights act as if there were more samples of that number
}

Output:

The mean of the samples is 3.5500
The weighted mean of the samples is 1.9000
The mean of x2 is 1.9000
The weights act as if there were more samples of that number

func MeanStdDev

func MeanStdDev(x, weights []float64) (mean, std float64)

MeanStdDev returns the sample mean and unbiased standard deviation When weights sum to 1 or less, a biased variance estimator should be used.

func MeanVariance

func MeanVariance(x, weights []float64) (mean, variance float64)

MeanVariance computes the sample mean and unbiased variance, where the mean and variance are

\sum_i w_i * x_i / (sum_i w_i)
\sum_i w_i (x_i - mean)^2 / (sum_i w_i - 1)

respectively. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights). When weights sum to 1 or less, a biased variance estimator should be used.

func Mode

func Mode(x, weights []float64) (val float64, count float64)

Mode returns the most common value in the dataset specified by x and the given weights. Strict float64 equality is used when comparing values, so users should take caution. If several values are the mode, any of them may be returned.

func Moment

func Moment(moment float64, x, weights []float64) float64

Moment computes the weighted n^th moment of the samples,

E[(x - μ)^N]

No degrees of freedom correction is done. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

func MomentAbout

func MomentAbout(moment float64, x []float64, mean float64, weights []float64) float64

MomentAbout computes the weighted n^th weighted moment of the samples about the given mean \mu,

E[(x - μ)^N]

No degrees of freedom correction is done. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

func PopMeanStdDev

func PopMeanStdDev(x, weights []float64) (mean, std float64)

PopMeanStdDev returns the sample mean and biased standard deviation (also known as "population standard deviation").

func PopMeanVariance

func PopMeanVariance(x, weights []float64) (mean, variance float64)

PopMeanVariance computes the sample mean and biased variance (also known as "population variance"), where the mean and variance are

\sum_i w_i * x_i / (sum_i w_i)
\sum_i w_i (x_i - mean)^2 / (sum_i w_i)

respectively. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

func PopStdDev

func PopStdDev(x, weights []float64) float64

PopStdDev returns the population standard deviation, i.e., a square root of the biased variance estimate.

Example

Code:

{
	x := []float64{8, 2, -9, 15, 4}
	stdev := PopStdDev(x, nil)
	fmt.Printf("The standard deviation of the population is %.4f\n", stdev)

	weights := []float64{2, 2, 6, 7, 1}
	weightedStdev := PopStdDev(x, weights)
	fmt.Printf("The weighted standard deviation of the population is %.4f\n", weightedStdev)
	// Output:
	// The standard deviation of the population is 7.8740
	// The weighted standard deviation of the population is 10.2754
}

Output:

The standard deviation of the population is 7.8740
The weighted standard deviation of the population is 10.2754

func PopVariance

func PopVariance(x, weights []float64) float64

PopVariance computes the unbiased weighted sample variance:

\sum_i w_i (x_i - mean)^2 / (sum_i w_i)

If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

Example

Code:

{
	x := []float64{8, 2, -9, 15, 4}
	variance := PopVariance(x, nil)
	fmt.Printf("The biased variance of the samples is %.4f\n", variance)

	weights := []float64{2, 2, 6, 7, 1}
	weightedVariance := PopVariance(x, weights)
	fmt.Printf("The weighted biased variance of the samples is %.4f\n", weightedVariance)
	// Output:
	// The biased variance of the samples is 62.0000
	// The weighted biased variance of the samples is 105.5833
}

Output:

The biased variance of the samples is 62.0000
The weighted biased variance of the samples is 105.5833

func Quantile

func Quantile(p float64, c CumulantKind, x, weights []float64) float64

Quantile returns the sample of x such that x is greater than or equal to the fraction p of samples. The exact behavior is determined by the CumulantKind, and p should be a number between 0 and 1. Quantile is theoretically the inverse of the CDF function, though it may not be the actual inverse for all values p and CumulantKinds.

The x data must be sorted in increasing order. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights). Quantile will panic if the length of x is zero.

CumulantKind behaviors:

func RNoughtSquared

func RNoughtSquared(x, y, weights []float64, beta float64) float64

RNoughtSquared returns the coefficient of determination defined as

R₀^2 = \sum_i w[i]*(beta*x[i])^2 / \sum_i w[i]*y[i]^2

for the line

y = beta*x

and the data in x and y with the given weights. RNoughtSquared should only be used for best-fit lines regressed through the origin.

The lengths of x and y must be equal. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

func ROC

func ROC(cutoffs, y []float64, classes []bool, weights []float64) (tpr, fpr, thresh []float64)

ROC returns paired false positive rate (FPR) and true positive rate (TPR) values corresponding to cutoff points on the receiver operator characteristic (ROC) curve obtained when y is treated as a binary classifier for classes with weights. The cutoff thresholds used to calculate the ROC are returned in thresh such that tpr[i] and fpr[i] are the true and false positive rates for y >= thresh[i].

The input y and cutoffs must be sorted, and values in y must correspond to values in classes and weights. SortWeightedLabeled can be used to sort y together with classes and weights.

For a given cutoff value, observations corresponding to entries in y greater than the cutoff value are classified as true, while those less than or equal to the cutoff value are classified as false. These assigned class labels are compared with the true values in the classes slice and used to calculate the FPR and TPR.

If weights is nil, all weights are treated as 1. If weights is not nil it must have the same length as y and classes, otherwise ROC will panic.

If cutoffs is nil or empty, all possible cutoffs are calculated, resulting in fpr and tpr having length one greater than the number of unique values in y. Otherwise fpr and tpr will be returned with the same length as cutoffs. floats.Span can be used to generate equally spaced cutoffs.

More details about ROC curves are available at https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Example (AUC_unweighted)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/integrate"
	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{0.1, 0.35, 0.4, 0.8}
	classes := []bool{true, false, true, false}

	tpr, fpr, _ := stat.ROC(nil, y, classes, nil)

	// Compute Area Under Curve.
	auc := integrate.Trapezoidal(fpr, tpr)
	fmt.Printf("true  positive rate: %v\n", tpr)
	fmt.Printf("false positive rate: %v\n", fpr)
	fmt.Printf("auc: %v\n", auc)

}

Output:

true  positive rate: [0 0 0.5 0.5 1]
false positive rate: [0 0.5 0.5 1 1]
auc: 0.25
Example (AUC_weighted)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/integrate"
	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{0.1, 0.35, 0.4, 0.8}
	classes := []bool{true, false, true, false}
	weights := []float64{1, 2, 2, 1}

	tpr, fpr, _ := stat.ROC(nil, y, classes, weights)

	// Compute Area Under Curve.
	auc := integrate.Trapezoidal(fpr, tpr)
	fmt.Printf("auc: %f\n", auc)

}

Output:

auc: 0.444444
Example (EquallySpacedCutoffs)

Code:play 

package main

import (
	"fmt"
	"math"

	"gonum.org/v1/gonum/floats"
	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{8, 7.5, 6, 5, 3, 0}
	classes := []bool{true, true, true, false, true, true}
	weights := []float64{2, 2, 3, 6, 1, 4}
	n := 9

	stat.SortWeightedLabeled(y, classes, weights)
	cutoffs := make([]float64, n)
	floats.Span(cutoffs, math.Nextafter(y[0], y[0]-1), y[len(y)-1])

	tpr, fpr, _ := stat.ROC(cutoffs, y, classes, weights)
	fmt.Printf("true  positive rate: %.3v\n", tpr)
	fmt.Printf("false positive rate: %.3v\n", fpr)

}

Output:

true  positive rate: [0.167 0.333 0.583 0.583 0.583 0.667 0.667 0.667 1]
false positive rate: [0 0 0 1 1 1 1 1 1]
Example (KnownCutoffs)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{8, 7.5, 6, 5, 3, 0}
	classes := []bool{true, true, true, false, true, false}
	weights := []float64{2, 2, 3, 6, 1, 4}
	cutoffs := []float64{-1, 3, 4}

	stat.SortWeightedLabeled(y, classes, weights)

	tpr, fpr, _ := stat.ROC(cutoffs, y, classes, weights)
	fmt.Printf("true  positive rate: %v\n", tpr)
	fmt.Printf("false positive rate: %v\n", fpr)

}

Output:

true  positive rate: [0.875 1 1]
false positive rate: [0.6 0.6 1]
Example (Threshold)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{0.1, 0.4, 0.35, 0.8}
	classes := []bool{false, false, true, true}
	stat.SortWeightedLabeled(y, classes, nil)

	tpr, fpr, thresh := stat.ROC(nil, y, classes, nil)
	fmt.Printf("true  positive rate: %v\n", tpr)
	fmt.Printf("false positive rate: %v\n", fpr)
	fmt.Printf("cutoff thresholds: %v\n", thresh)

}

Output:

true  positive rate: [0 0.5 0.5 1 1]
false positive rate: [0 0 0.5 0.5 1]
cutoff thresholds: [+Inf 0.8 0.4 0.35 0.1]
Example (Unsorted)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{8, 7.5, 6, 5, 3, 0}
	classes := []bool{true, true, true, false, true, false}
	weights := []float64{2, 2, 3, 6, 1, 4}

	stat.SortWeightedLabeled(y, classes, weights)

	tpr, fpr, _ := stat.ROC(nil, y, classes, weights)
	fmt.Printf("true  positive rate: %v\n", tpr)
	fmt.Printf("false positive rate: %v\n", fpr)

}

Output:

true  positive rate: [0 0.25 0.5 0.875 0.875 1 1]
false positive rate: [0 0 0 0 0.6 0.6 1]
Example (Unweighted)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{0, 3, 5, 6, 7.5, 8}
	classes := []bool{false, true, false, true, true, true}

	tpr, fpr, _ := stat.ROC(nil, y, classes, nil)
	fmt.Printf("true  positive rate: %v\n", tpr)
	fmt.Printf("false positive rate: %v\n", fpr)

}

Output:

true  positive rate: [0 0.25 0.5 0.75 0.75 1 1]
false positive rate: [0 0 0 0 0.5 0.5 1]
Example (Weighted)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{0, 3, 5, 6, 7.5, 8}
	classes := []bool{false, true, false, true, true, true}
	weights := []float64{4, 1, 6, 3, 2, 2}

	tpr, fpr, _ := stat.ROC(nil, y, classes, weights)
	fmt.Printf("true  positive rate: %v\n", tpr)
	fmt.Printf("false positive rate: %v\n", fpr)

}

Output:

true  positive rate: [0 0.25 0.5 0.875 0.875 1 1]
false positive rate: [0 0 0 0 0.6 0.6 1]

func RSquared

func RSquared(x, y, weights []float64, alpha, beta float64) float64

RSquared returns the coefficient of determination defined as

R^2 = 1 - \sum_i w[i]*(y[i] - alpha - beta*x[i])^2 / \sum_i w[i]*(y[i] - mean(y))^2

for the line

y = alpha + beta*x

and the data in x and y with the given weights.

The lengths of x and y must be equal. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights).

func RSquaredFrom

func RSquaredFrom(estimates, values, weights []float64) float64

RSquaredFrom returns the coefficient of determination defined as

R^2 = 1 - \sum_i w[i]*(estimate[i] - value[i])^2 / \sum_i w[i]*(value[i] - mean(values))^2

and the data in estimates and values with the given weights.

The lengths of estimates and values must be equal. If weights is nil then all of the weights are 1. If weights is not nil, then len(values) must equal len(weights).

func Skew

func Skew(x, weights []float64) float64

Skew computes the skewness of the sample data. If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights). When weights sum to 1 or less, a biased variance estimator should be used.

func SortWeighted

func SortWeighted(x, weights []float64)

SortWeighted rearranges the data in x along with their corresponding weights so that the x data are sorted. The data is sorted in place. Weights may be nil, but if weights is non-nil then it must have the same length as x.

func SortWeightedLabeled

func SortWeightedLabeled(x []float64, labels []bool, weights []float64)

SortWeightedLabeled rearranges the data in x along with their corresponding weights and boolean labels so that the x data are sorted. The data is sorted in place. Weights and labels may be nil, if either is non-nil it must have the same length as x.

func StdDev

func StdDev(x, weights []float64) float64

StdDev returns the sample standard deviation.

Example

Code:

{
	x := []float64{8, 2, -9, 15, 4}
	stdev := StdDev(x, nil)
	fmt.Printf("The standard deviation of the samples is %.4f\n", stdev)

	weights := []float64{2, 2, 6, 7, 1}
	weightedStdev := StdDev(x, weights)
	fmt.Printf("The weighted standard deviation of the samples is %.4f\n", weightedStdev)
	// Output:
	// The standard deviation of the samples is 8.8034
	// The weighted standard deviation of the samples is 10.5733
}

Output:

The standard deviation of the samples is 8.8034
The weighted standard deviation of the samples is 10.5733

func StdErr

func StdErr(std, sampleSize float64) float64

StdErr returns the standard error in the mean with the given values.

Example

Code:

{
	x := []float64{8, 2, -9, 15, 4}
	weights := []float64{2, 2, 6, 7, 1}
	mean := Mean(x, weights)
	stdev := StdDev(x, weights)
	nSamples := floats.Sum(weights)
	stdErr := StdErr(stdev, nSamples)
	fmt.Printf("The standard deviation is %.4f and there are %g samples, so the mean\nis likely %.4f ± %.4f.", stdev, nSamples, mean, stdErr)
	// Output:
	// The standard deviation is 10.5733 and there are 18 samples, so the mean
	// is likely 4.1667 ± 2.4921.
}

Output:

The standard deviation is 10.5733 and there are 18 samples, so the mean
is likely 4.1667 ± 2.4921.

func StdScore

func StdScore(x, mean, std float64) float64

StdScore returns the standard score (a.k.a. z-score, z-value) for the value x with the given mean and standard deviation, i.e.

(x - mean) / std

func TOC

func TOC(classes []bool, weights []float64) (min, ntp, max []float64)

TOC returns the Total Operating Characteristic for the classes provided and the minimum and maximum bounds for the TOC.

The input y values that correspond to classes and weights must be sorted in ascending order. classes[i] is the class of value y[i] and weights[i] is the weight of y[i]. SortWeightedLabeled can be used to sort classes together with weights by the rank variable, i+1.

The returned ntp values can be interpreted as the number of true positives where values above the given rank are assigned class true for each given rank from 1 to len(classes).

ntp_i = sum_{j ≥ len(ntp)-1 - i} [ classes_j ] * weights_j, where [x] = 1 if x else 0.

The values of min and max provide the minimum and maximum possible number of false values for the set of classes. The first element of ntp, min and max are always zero as this corresponds to assigning all data class false and the last elements are always weighted sum of classes as this corresponds to assigning every data class true. For len(classes) != 0, the lengths of min, ntp and max are len(classes)+1.

If weights is nil, all weights are treated as 1. When weights are not nil, the calculation of min and max allows for partial assignment of single data points. If weights is not nil it must have the same length as classes, otherwise TOC will panic.

More details about TOC curves are available at https://en.wikipedia.org/wiki/Total_operating_characteristic

Example

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/stat"
)

func main() {
	classes := []bool{
		false, false, false, false, false, false,
		false, false, false, false, false, false,
		false, false, true, true, true, true,
		true, true, true, false, false, true,
		false, true, false, false, true, false,
	}

	min, ntp, max := stat.TOC(classes, nil)
	fmt.Printf("minimum bound: %v\n", min)
	fmt.Printf("TOC:           %v\n", ntp)
	fmt.Printf("maximum bound: %v\n", max)

}

Output:

minimum bound: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10]
TOC:           [0 0 1 1 1 2 2 3 3 3 4 5 6 7 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10]
maximum bound: [0 1 2 3 4 5 6 7 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10]
Example (AUC_unweighted)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/floats"
	"gonum.org/v1/gonum/integrate"
	"gonum.org/v1/gonum/stat"
)

func main() {
	classes := []bool{true, false, true, false}

	_, ntp, _ := stat.TOC(classes, nil)
	pos := ntp[len(ntp)-1]
	base := float64(len(classes)) - pos

	// Compute the area under ntp and under the
	// minimum bound.
	x := floats.Span(make([]float64, len(classes)+1), 0, float64(len(classes)))
	aucNTP := integrate.Trapezoidal(x, ntp)
	aucMin := pos * pos / 2

	// Calculate the area under the curve
	// within the bounding parallelogram.
	auc := aucNTP - aucMin

	// Calculate the area within the bounding
	// parallelogram.
	par := pos * base

	// The AUC is the ratio of the area under
	// the curve within the bounding parallelogram
	// and the total parallelogram bound.
	auc /= par

	fmt.Printf("number of true positives: %v\n", ntp)
	fmt.Printf("auc: %v\n", auc)

}

Output:

number of true positives: [0 0 1 1 2]
auc: 0.25
Example (AUC_weighted)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/floats"
	"gonum.org/v1/gonum/integrate"
	"gonum.org/v1/gonum/stat"
)

func main() {
	classes := []bool{true, false, true, false}
	weights := []float64{1, 2, 2, 1}

	min, ntp, max := stat.TOC(classes, weights)

	// Compute the area under ntp and under the
	// minimum and maximum bounds.
	x := make([]float64, len(classes)+1)
	floats.CumSum(x[1:], weights)
	aucNTP := integrate.Trapezoidal(x, ntp)
	aucMin := integrate.Trapezoidal(x, min)
	aucMax := integrate.Trapezoidal(x, max)

	// Calculate the area under the curve
	// within the bounding parallelogram.
	auc := aucNTP - aucMin

	// Calculate the area within the bounding
	// parallelogram.
	par := aucMax - aucMin

	// The AUC is the ratio of the area under
	// the curve within the bounding parallelogram
	// and the total parallelogram bound.
	auc /= par

	fmt.Printf("number of true positives: %v\n", ntp)
	fmt.Printf("auc: %f\n", auc)

}

Output:

number of true positives: [0 0 2 2 3]
auc: 0.444444
Example (Unsorted)

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/stat"
)

func main() {
	y := []float64{8, 7.5, 6, 5, 3, 0}
	classes := []bool{true, false, true, false, false, false}
	weights := []float64{4, 1, 6, 3, 2, 2}

	stat.SortWeightedLabeled(y, classes, weights)

	min, ntp, max := stat.TOC(classes, weights)
	fmt.Printf("minimum bound: %v\n", min)
	fmt.Printf("TOC:           %v\n", ntp)
	fmt.Printf("maximum bound: %v\n", max)

}

Output:

minimum bound: [0 0 0 3 6 8 10]
TOC:           [0 4 4 10 10 10 10]
maximum bound: [0 4 5 10 10 10 10]

func Variance

func Variance(x, weights []float64) float64

Variance computes the unbiased weighted sample variance:

\sum_i w_i (x_i - mean)^2 / (sum_i w_i - 1)

If weights is nil then all of the weights are 1. If weights is not nil, then len(x) must equal len(weights). When weights sum to 1 or less, a biased variance estimator should be used.

Example

Code:

{
	x := []float64{8, 2, -9, 15, 4}
	variance := Variance(x, nil)
	fmt.Printf("The variance of the samples is %.4f\n", variance)

	weights := []float64{2, 2, 6, 7, 1}
	weightedVariance := Variance(x, weights)
	fmt.Printf("The weighted variance of the samples is %.4f\n", weightedVariance)
	// Output:
	// The variance of the samples is 77.5000
	// The weighted variance of the samples is 111.7941
}

Output:

The variance of the samples is 77.5000
The weighted variance of the samples is 111.7941

Types

type CC

type CC struct {
	// contains filtered or unexported fields
}

CC is a type for computing the canonical correlations of a pair of matrices. The results of the canonical correlation analysis are only valid if the call to CanonicalCorrelations was successful.

Example

Code:play 

package main

import (
	"fmt"
	"log"

	"gonum.org/v1/gonum/floats"
	"gonum.org/v1/gonum/mat"
	"gonum.org/v1/gonum/stat"
)

// symView is a helper for getting a View of a SymDense.
type symView struct {
	sym *mat.SymDense

	i, j, r, c int
}

func (s symView) Dims() (r, c int) { return s.r, s.c }

func (s symView) At(i, j int) float64 {
	if i < 0 || s.r <= i {
		panic("i out of bounds")
	}
	if j < 0 || s.c <= j {
		panic("j out of bounds")
	}
	return s.sym.At(s.i+i, s.j+j)
}

func (s symView) T() mat.Matrix { return mat.Transpose{Matrix: s} }

func main() {
	// This example is directly analogous to Example 3.5 on page 87 of
	// Koch, Inge. Analysis of multivariate and high-dimensional data.
	// Vol. 32. Cambridge University Press, 2013. ISBN: 9780521887939

	// bostonData is the Boston Housing Data of Harrison and Rubinfeld (1978)
	n, _ := bostonData.Dims()
	var xd, yd = 7, 4
	// The variables (columns) of bostonData can be partitioned into two sets:
	// those that deal with environmental/social variables (xdata), and those
	// that contain information regarding the individual (ydata). Because the
	// variables can be naturally partitioned in this way, these data are
	// appropriate for canonical correlation analysis. The columns (variables)
	// of xdata are, in order:
	//  per capita crime rate by town,
	//  proportion of non-retail business acres per town,
	//  nitric oxide concentration (parts per 10 million),
	//  weighted distances to Boston employment centres,
	//  index of accessibility to radial highways,
	//  pupil-teacher ratio by town, and
	//  proportion of blacks by town.
	xdata := bostonData.Slice(0, n, 0, xd)

	// The columns (variables) of ydata are, in order:
	//  average number of rooms per dwelling,
	//  proportion of owner-occupied units built prior to 1940,
	//  full-value property-tax rate per $10000, and
	//  median value of owner-occupied homes in $1000s.
	ydata := bostonData.Slice(0, n, xd, xd+yd)

	// For comparison, calculate the correlation matrix for the original data.
	var cor mat.SymDense
	stat.CorrelationMatrix(&cor, bostonData, nil)

	// Extract just those correlations that are between xdata and ydata.
	var corRaw = symView{sym: &cor, i: 0, j: xd, r: xd, c: yd}

	// Note that the strongest correlation between individual variables is 0.91
	// between the 5th variable of xdata (index of accessibility to radial
	// highways) and the 3rd variable of ydata (full-value property-tax rate per
	// $10000).
	fmt.Printf("corRaw = %.4f", mat.Formatted(corRaw, mat.Prefix("         ")))

	// Calculate the canonical correlations.
	var cc stat.CC
	err := cc.CanonicalCorrelations(xdata, ydata, nil)
	if err != nil {
		log.Fatal(err)
	}

	// Unpack cc.
	var pVecs, qVecs, phiVs, psiVs mat.Dense
	ccors := cc.CorrsTo(nil)
	cc.LeftTo(&pVecs, true)
	cc.RightTo(&qVecs, true)
	cc.LeftTo(&phiVs, false)
	cc.RightTo(&psiVs, false)

	// Canonical Correlation Matrix, or the correlations between the sphered
	// data.
	var corSph mat.Dense
	corSph.CloneFrom(&pVecs)
	col := make([]float64, xd)
	for j := 0; j < yd; j++ {
		mat.Col(col, j, &corSph)
		floats.Scale(ccors[j], col)
		corSph.SetCol(j, col)
	}
	corSph.Product(&corSph, qVecs.T())
	fmt.Printf("\n\ncorSph = %.4f", mat.Formatted(&corSph, mat.Prefix("         ")))

	// Canonical Correlations. Note that the first canonical correlation is
	// 0.95, stronger than the greatest correlation in the original data, and
	// much stronger than the greatest correlation in the sphered data.
	fmt.Printf("\n\nccors = %.4f", ccors)

	// Left and right eigenvectors of the canonical correlation matrix.
	fmt.Printf("\n\npVecs = %.4f", mat.Formatted(&pVecs, mat.Prefix("        ")))
	fmt.Printf("\n\nqVecs = %.4f", mat.Formatted(&qVecs, mat.Prefix("        ")))

	// Canonical Correlation Transforms. These can be useful as they represent
	// the canonical variables as linear combinations of the original variables.
	fmt.Printf("\n\nphiVs = %.4f", mat.Formatted(&phiVs, mat.Prefix("        ")))
	fmt.Printf("\n\npsiVs = %.4f", mat.Formatted(&psiVs, mat.Prefix("        ")))

}

Output:

corRaw = ⎡-0.2192   0.3527   0.5828  -0.3883⎤
         ⎢-0.3917   0.6448   0.7208  -0.4837⎥
         ⎢-0.3022   0.7315   0.6680  -0.4273⎥
         ⎢ 0.2052  -0.7479  -0.5344   0.2499⎥
         ⎢-0.2098   0.4560   0.9102  -0.3816⎥
         ⎢-0.3555   0.2615   0.4609  -0.5078⎥
         ⎣ 0.1281  -0.2735  -0.4418   0.3335⎦

corSph = ⎡ 0.0118   0.0525   0.2300  -0.1363⎤
         ⎢-0.1810   0.3213   0.3814  -0.1412⎥
         ⎢ 0.0166   0.2241   0.0104  -0.2235⎥
         ⎢ 0.0346  -0.5481  -0.0034  -0.1994⎥
         ⎢ 0.0303  -0.0956   0.7152   0.2039⎥
         ⎢-0.0298  -0.0022   0.0739  -0.3703⎥
         ⎣-0.1226  -0.0746  -0.3899   0.1541⎦

ccors = [0.9451 0.6787 0.5714 0.2010]

pVecs = ⎡-0.2574  -0.0158  -0.2122  -0.0946⎤
        ⎢-0.4837  -0.3837  -0.1474   0.6597⎥
        ⎢-0.0801  -0.3494  -0.3287  -0.2862⎥
        ⎢ 0.1278   0.7337  -0.4851   0.2248⎥
        ⎢-0.6969   0.4342   0.3603   0.0291⎥
        ⎢-0.0991  -0.0503  -0.6384   0.1022⎥
        ⎣ 0.4260  -0.0323   0.2290   0.6419⎦

qVecs = ⎡ 0.0182   0.1583   0.0067  -0.9872⎤
        ⎢-0.2348  -0.9483   0.1462  -0.1554⎥
        ⎢-0.9701   0.2406   0.0252   0.0209⎥
        ⎣ 0.0593   0.1330   0.9889   0.0291⎦

phiVs = ⎡-0.0027  -0.0093  -0.0490  -0.0155⎤
        ⎢-0.0429   0.0242  -0.0361   0.1839⎥
        ⎢-1.2248  -5.6031  -5.8094  -4.7927⎥
        ⎢-0.0044   0.3424  -0.4470   0.1150⎥
        ⎢-0.0742   0.1193   0.1116   0.0022⎥
        ⎢-0.0233  -0.1046  -0.3853  -0.0161⎥
        ⎣ 0.0001  -0.0005   0.0030   0.0082⎦

psiVs = ⎡ 0.0302   0.3002  -0.0878  -1.9583⎤
        ⎢-0.0065  -0.0392   0.0118  -0.0061⎥
        ⎢-0.0052   0.0046   0.0023   0.0008⎥
        ⎣ 0.0020  -0.0037   0.1293   0.1038⎦

func (*CC) CanonicalCorrelations

func (c *CC) CanonicalCorrelations(x, y mat.Matrix, weights []float64) error

CanonicalCorrelations performs a canonical correlation analysis of the input data x and y, columns of which should be interpretable as two sets of measurements on the same observations (rows). These observations are optionally weighted by weights. The result of the analysis is stored in the receiver if the analysis is successful.

Canonical correlation analysis finds associations between two sets of variables on the same observations by finding linear combinations of the two sphered datasets that maximize the correlation between them.

Some notation: let Xc and Yc denote the centered input data matrices x and y (column means subtracted from each column), let Sx and Sy denote the sample covariance matrices within x and y respectively, and let Sxy denote the covariance matrix between x and y. The sphered data can then be expressed as Xc * Sx^{-1/2} and Yc * Sy^{-1/2} respectively, and the correlation matrix between the sphered data is called the canonical correlation matrix, Sx^{-1/2} * Sxy * Sy^{-1/2}. In cases where S^{-1/2} is ambiguous for some covariance matrix S, S^{-1/2} is taken to be E * D^{-1/2} * Eᵀ where S can be eigendecomposed as S = E * D * Eᵀ.

The canonical correlations are the correlations between the corresponding pairs of canonical variables and can be obtained with c.Corrs(). Canonical variables can be obtained by projecting the sphered data into the left and right eigenvectors of the canonical correlation matrix, and these eigenvectors can be obtained with c.Left(m, true) and c.Right(m, true) respectively. The canonical variables can also be obtained directly from the centered raw data by using the back-transformed eigenvectors which can be obtained with c.Left(m, false) and c.Right(m, false) respectively.

The first pair of left and right eigenvectors of the canonical correlation matrix can be interpreted as directions into which the respective sphered data can be projected such that the correlation between the two projections is maximized. The second pair and onwards solve the same optimization but under the constraint that they are uncorrelated (orthogonal in sphered space) to previous projections.

CanonicalCorrelations will panic if the inputs x and y do not have the same number of rows.

The slice weights is used to weight the observations. If weights is nil, each weight is considered to have a value of one, otherwise the length of weights must match the number of observations (rows of both x and y) or CanonicalCorrelations will panic.

More details can be found at https://en.wikipedia.org/wiki/Canonical_correlation or in Chapter 3 of Koch, Inge. Analysis of multivariate and high-dimensional data. Vol. 32. Cambridge University Press, 2013. ISBN: 9780521887939

func (*CC) CorrsTo

func (c *CC) CorrsTo(dst []float64) []float64

CorrsTo returns the canonical correlations, using dst if it is not nil. If dst is not nil and len(dst) does not match the number of columns in the y input matrix, Corrs will panic.

func (*CC) LeftTo

func (c *CC) LeftTo(dst *mat.Dense, spheredSpace bool)

LeftTo returns the left eigenvectors of the canonical correlation matrix if spheredSpace is true. If spheredSpace is false it returns these eigenvectors back-transformed to the original data space.

If dst is empty, LeftTo will resize dst to be xd×yd. When dst is non-empty, LeftTo will panic if dst is not xd×yd. LeftTo will also panic if the receiver does not contain a successful CC.

func (*CC) RightTo

func (c *CC) RightTo(dst *mat.Dense, spheredSpace bool)

RightTo returns the right eigenvectors of the canonical correlation matrix if spheredSpace is true. If spheredSpace is false it returns these eigenvectors back-transformed to the original data space.

If dst is empty, RightTo will resize dst to be yd×yd. When dst is non-empty, RightTo will panic if dst is not yd×yd. RightTo will also panic if the receiver does not contain a successful CC.

type CumulantKind

type CumulantKind int

CumulantKind specifies the behavior for calculating the empirical CDF or Quantile

const (
	// Empirical treats the distribution as the actual empirical distribution.
	Empirical CumulantKind = 1
	// LinInterp linearly interpolates the empirical distribution between sample values, with a flat extrapolation.
	LinInterp CumulantKind = 4
)

List of supported CumulantKind values for the Quantile function. Constant values should match the R nomenclature. See https://en.wikipedia.org/wiki/Quantile#Estimating_the_quantiles_of_a_population

type PC

type PC struct {
	// contains filtered or unexported fields
}

PC is a type for computing and extracting the principal components of a matrix. The results of the principal components analysis are only valid if the call to PrincipalComponents was successful.

Example

Code:play 

package main

import (
	"fmt"

	"gonum.org/v1/gonum/mat"
	"gonum.org/v1/gonum/stat"
)

func main() {
	// iris is a truncated sample of the Fisher's Iris dataset.
	n := 10
	d := 4
	iris := mat.NewDense(n, d, []float64{
		5.1, 3.5, 1.4, 0.2,
		4.9, 3.0, 1.4, 0.2,
		4.7, 3.2, 1.3, 0.2,
		4.6, 3.1, 1.5, 0.2,
		5.0, 3.6, 1.4, 0.2,
		5.4, 3.9, 1.7, 0.4,
		4.6, 3.4, 1.4, 0.3,
		5.0, 3.4, 1.5, 0.2,
		4.4, 2.9, 1.4, 0.2,
		4.9, 3.1, 1.5, 0.1,
	})

	// Calculate the principal component direction vectors
	// and variances.
	var pc stat.PC
	ok := pc.PrincipalComponents(iris, nil)
	if !ok {
		return
	}
	fmt.Printf("variances = %.4f\n\n", pc.VarsTo(nil))

	// Project the data onto the first 2 principal components.
	k := 2
	var proj mat.Dense
	var vec mat.Dense
	pc.VectorsTo(&vec)
	proj.Mul(iris, vec.Slice(0, d, 0, k))

	fmt.Printf("proj = %.4f", mat.Formatted(&proj, mat.Prefix("       ")))

}

Output:

variances = [0.1666 0.0207 0.0079 0.0019]

proj = ⎡-6.1686   1.4659⎤
       ⎢-5.6767   1.6459⎥
       ⎢-5.6699   1.3642⎥
       ⎢-5.5643   1.3816⎥
       ⎢-6.1734   1.3309⎥
       ⎢-6.7278   1.4021⎥
       ⎢-5.7743   1.1498⎥
       ⎢-6.0466   1.4714⎥
       ⎢-5.2709   1.3570⎥
       ⎣-5.7533   1.6207⎦

func (*PC) PrincipalComponents

func (c *PC) PrincipalComponents(a mat.Matrix, weights []float64) (ok bool)

PrincipalComponents performs a weighted principal components analysis on the matrix of the input data which is represented as an n×d matrix a where each row is an observation and each column is a variable.

PrincipalComponents centers the variables but does not scale the variance.

The weights slice is used to weight the observations. If weights is nil, each weight is considered to have a value of one, otherwise the length of weights must match the number of observations or PrincipalComponents will panic.

PrincipalComponents returns whether the analysis was successful.

func (*PC) VarsTo

func (c *PC) VarsTo(dst []float64) []float64

VarsTo returns the column variances of the principal component scores, b * vecs, where b is a matrix with centered columns. Variances are returned in descending order. If dst is not nil it is used to store the variances and returned. Vars will panic if the receiver has not successfully performed a principal components analysis or dst is not nil and the length of dst is not min(n, d).

func (*PC) VectorsTo

func (c *PC) VectorsTo(dst *mat.Dense)

VectorsTo returns the component direction vectors of a principal components analysis. The vectors are returned in the columns of a d×min(n, d) matrix.

If dst is empty, VectorsTo will resize dst to be d×min(n, d). When dst is non-empty, VectorsTo will panic if dst is not d×min(n, d). VectorsTo will also panic if the receiver does not contain a successful PC.

Source Files

doc.go pca_cca.go roc.go stat.go statmat.go

Directories

PathSynopsis
stat/cardPackage card provides cardinality estimation functions.
stat/combinPackage combin implements routines involving combinatorics (permutations, combinations, etc.).
stat/distmatPackage distmat provides probability distributions over matrices.
stat/distmvPackage distmv provides multivariate random distribution types.
stat/distuvPackage distuv provides univariate random distribution types.
stat/mdsPackage mds provides multidimensional scaling functions.
stat/samplemvPackage samplemv implements advanced sampling routines from explicit and implicit probability distributions.
stat/sampleuvPackage sampleuv implements advanced sampling routines from explicit and implicit probability distributions.
stat/spatialPackage spatial provides spatial statistical functions.
Version
v0.15.1 (latest)
Published
Aug 16, 2024
Platform
linux/amd64
Imports
6 packages
Last checked
11 hours ago

Tools for package owners.