Basic Statistics

Basic SPIs Overview

In this section, we detail the SPIs that we have categorised as 'basic statistics'. These include SPIs that are foundational to statistical analysis, often because they are widely applicable, relatively straightforward to understand and compute, and can form the basis for more complex methods.

Covariance
Keywords: undirected, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier: cov

The covariance matrix is estimated for a wide variety of statistical procedures. Due to the z-scoring of the time series, the correlation and covariance matrices are equivalent and thus the covariance statistic is within [−1, 1]. We use scikit-learn to compute the covariance matrix via a number of estimators:

Standard maximum likelihood estimate (MLE) (denoted by the modifier EmpiricalCovariance).
Elliptic envelope (EllipticEnvelope).
Minimum covariance determinant (MinCovDet) methods for outlier removal.
Lasso technique, which uses an l1-regularisation to sparsify the covariance matrix (GraphicalLasso).
- A method with the regularisation method chosen through cross-validation with five splits, (GraphicalLassoCV).
Basic shrinkage covariance estimator with a fixed shrinkage coefficient of 0.1 (ShrunkCovariance).
The Ledoit-Wolf method for optimising the shrinkage coefficient (LedoitWolf).
Oracle approximating shrinkage, an improved method for optimising the shrinkage coefficient if the data are Gaussian (OAS).

Covariance Estimators

cov_EmpiricalCovariance
cov_EllipticEnvelope
cov_MinCovDet
cov_GraphicalLasso
cov_GraphicalLassoCV
cov_ShrunkCovariance
cov_LedoitWolf
cov_OAS

Cross Correlation
Keywords: undirected, linear, signed/unsigned, bivariate, time-dependent.
Base Identifier: xcorr

The cross-correlation function is defined as the Pearson correlation between two time series for all lags, giving values in [−1, 1] for each lag. To estimate the cross-correlation function, we use SciPy, which outputs a correlogram, i.e., the correlation from the MLE of the cross-covariance at a given lag, normalised by the auto-covariance. The cross-correlation is computed with fewer observations at larger lags and so it is common to truncate the function at a given level, which we do by only using the first T/4 lags, where T is the number of observations.

A correlation below $1.96/\sqrt{T}$ is considered statistically insignificant, thus we optionally cut off the lags at this level (the modifier sig-True means we only use the statistically significant values, and sig-False means we use all values). We take the two summary statistics of the correlogram: the maximum over the considered lags (denoted by modifier max), and the average over the considered lags (mean).

Cross Correlation Estimators

xcorr_max_sig-True
xcorr_mean_sig-True
xcorr_mean_sig-False

Squared Cross Correlation Estimators

xcorr-sq_max_sig-True
xcorr-sq_mean_sig-True
xcorr-sq_mean_sig-False

Kendall's Rank Correlation Coefficient
Keywords: undirected, nonlinear, signed, bivariate, contemporaneous.
Base Identifier: kendalltau
Key Reference: [1]

Kendall’s τ assesses the association of ordinal variables, similar to Spearman’s ρ, but has certain differences, such as becoming more mathematically tractable in the event of ties. The method is implemented via function kendalltau in SciPy, and has a value in [−1, 1].

Kendall's Rank Correlation Coefficient Estimator

kendalltau

Precision
Keywords: undirected, linear, signed, multivariate, contemporaneous.
Base Identifier: prec

The precision matrix is the matrix inverse of the covariance matrix, and can be used to quantify the association between each pair of time series while controlling for concomitant effects of all other time series. For normalised time-series data, the precision matrix is equivalent to the partial correlation between each pairwise time series, conditioned on all other time series, and is within [−1, 1]. The precision matrix is computed via the same module as the covariance matrix (in scikit-learn), and has the same estimators.

Precision Estimators

prec_EmpiricalCovariance
prec_EllipticEnvelope
prec_MinCovDet
prec_GraphicalLasso
prec_GraphicalLassoCV
prec_ShrunkCovariance
prec_LedoitWolf
prec_OAS

Spearman's Rank-Correlation Coefficient
Keywords: undirected, nonlinear, signed, bivariate, contemporaneous.
Base Identifier: spearmanr

Spearman’s ρ is a nonparametric measure of rank correlation between variables. The use of ordinal (ranked) variables allows the statistic to capture non-linear (but monotonic) relationships between random variables. The method is implemented via function spearmanr in SciPy, and has a value in [−1, 1].

Spearman's Rank-Correlation Coefficient Estimator

spearmanr

PreviousSPI Descriptions NextDistance Similarity

Last updated 3 months ago