Basic Statistics
Basic SPIs Overview
In this section, we detail the SPIs that we have categorised as 'basic statistics'. These include SPIs that are foundational to statistical analysis, often because they are widely applicable, relatively straightforward to understand and compute, and can form the basis for more complex methods.
CovarianceKeywords: undirected, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier:
cov
The covariance matrix is estimated for a wide variety of statistical procedures. Due to the z-scoring of the time series, the correlation and covariance matrices are equivalent and thus the covariance statistic is within [−1, 1]. We use v0.24.1 of scikit-learn to compute the covariance matrix via a number of estimators:
Standard maximum likelihood estimate (MLE) (denoted by the modifier
EmpiricalCovariance
).Elliptic envelope (
EllipticEnvelope
).Minimum covariance determinant (
MinCovDet
) methods for outlier removal.Lasso technique, which uses an l1-regularization to sparsify the covariance matrix (
GraphicalLasso
).A method with the regularisation method chosen through cross validation with five splits, (
GraphicalLassoCV
).
Basic shrinkage covariance estimator with a fixed shrinkage coefficient of 0.1 (
ShrunkCovariance
).The Ledoit-Wolf method for optimising the shrinkage coefficient (
LedoitWolf
).Oracle approximating shrinkage, an improved method for optimszing the shrinkage coefficient if the data are Gaussian (
OAS
).
Cross CorrelationKeywords: undirected, linear, signed/unsigned, bivariate, time-dependent.
Base Identifier:
xcorr
The cross-correlation function is defined as the Pearson correlation between two time series for all lags, giving values in [−1, 1] for each lag. To estimate the cross-correlation function, we use SciPy, which outputs a correlogram, i.e., the correlation from the MLE of the cross-covariance at a given lag, normalised by the auto-covariance. The cross-correlation is computed with fewer observations at larger lags and so it is common to truncate the function at a given level, which we do by only using the first T/4 lags, where T is the number of observations.
Kendall's Rank Correlation CoefficientKeywords: undirected, nonlinear, signed, bivariate, contemporaneous.
Base Identifier:
kendalltau
Key Reference: [1]
Kendall’s τ assesses the association of ordinal variables, similar to Spearman’s ρ, but has certain differences, such as becoming more mathematically tractable in the event of ties. The method is implemented via function kendalltau in SciPy, and has a value in [−1, 1].
PrecisionKeywords: undirected, linear, signed, multivariate, contemporaneous.
Base Identifier:
prec
The precision matrix is the matrix inverse of the covariance matrix, and can be used to quantify the association between each pair of time series while controlling for concomitant effects of all other time series. For normalised time-series data, the precision matrix is equivalent to the partial correlation between each pairwise time series, conditioned on all other time series, and is within [−1, 1]. The precision matrix is computed via the same module as the covariance matrix (in scikit-learn), and has the same estimators.
Spearman's Rank-Correlation CoefficientKeywords: undirected, nonlinear, signed, bivariate, contemporaneous.
Base Identifier:
spearmanr
Spearman’s ρ is a nonparametric measure of rank correlation between variables. The use of ordinal (ranked) variables allows the statistic to capture non-linear (but monotonic) relationships between random variables. The method is implemented via function spearmanr
in SciPy, and has a value in [−1, 1].
Last updated