pyspi: Statistics for Pairwise Interactions
pyspi GitHub
  • Welcome to pyspi
    • Citing pyspi
  • Installing and using pyspi
    • Installation
      • Alternative Installation Options
      • Troubleshooting
    • Usage
      • Walkthrough Tutorials
        • Getting Started: A Simple Demonstration
        • Neuroimaging: fMRI Time Series
        • Finance: Stock Price Time Series
        • Distributing Calculations
      • Advanced Usage
        • Creating a reduced SPI set
        • Distributing calculations on a cluster
      • FAQ
  • Information about pyspi
    • SPIs
      • Glossary of Terms
      • Table of SPIs
      • SPI Descriptions
        • Basic Statistics
        • Distance Similarity
        • Causal Inference
        • Information Theory
        • Spectral
        • Miscellaneous
      • SPI Subsets
    • API Reference
      • pyspi.calculator.CorrelationFrame
      • pyspi.calculator.Calculator
      • pyspi.data.Data
      • pyspi.calculator.CalculatorFrame
      • pyspi.utils.filter_spis
    • Publications using pyspi
    • Related Packages
  • Development
    • Development
      • Incorporating new SPIs
      • Contributing to pyspi
      • Code of Conduct
    • License
Powered by GitBook

All page cover images on this wiki are created with the help of DALL-E, an AI program developed by OpenAI, or stock images from Unsplash.

On this page
  1. Information about pyspi
  2. SPIs
  3. SPI Descriptions

Basic Statistics

PreviousSPI DescriptionsNextDistance Similarity

Last updated 26 days ago

Basic SPIs Overview

In this section, we detail the SPIs that we have categorised as 'basic statistics'. These include SPIs that are foundational to statistical analysis, often because they are widely applicable, relatively straightforward to understand and compute, and can form the basis for more complex methods.


Covariance

: undirected, nonlinear, unsigned, bivariate, contemporaneous.

Base Identifier: cov

The covariance matrix is estimated for a wide variety of statistical procedures. Due to the z-scoring of the time series, the correlation and covariance matrices are equivalent and thus the covariance statistic is within [−1, 1]. We use scikit-learn to compute the covariance matrix via a number of estimators:

  • Standard maximum likelihood estimate (MLE) (denoted by the modifier ).

  • Elliptic envelope ().

  • Minimum covariance determinant () methods for outlier removal.

  • Lasso technique, which uses an l1-regularisation to sparsify the covariance matrix ().

    • A method with the regularisation method chosen through cross-validation with five splits, ().

  • Basic shrinkage covariance estimator with a fixed shrinkage coefficient of 0.1 ().

  • The Ledoit-Wolf method for optimising the shrinkage coefficient ().

  • Oracle approximating shrinkage, an improved method for optimising the shrinkage coefficient if the data are Gaussian ().

Covariance Estimators
  • cov_EmpiricalCovariance

  • cov_EllipticEnvelope

  • cov_MinCovDet

  • cov_GraphicalLasso

  • cov_GraphicalLassoCV

  • cov_ShrunkCovariance

  • cov_LedoitWolf

  • cov_OAS


Cross Correlation

Base Identifier: xcorr

The cross-correlation function is defined as the Pearson correlation between two time series for all lags, giving values in [−1, 1] for each lag. To estimate the cross-correlation function, we use SciPy, which outputs a correlogram, i.e., the correlation from the MLE of the cross-covariance at a given lag, normalised by the auto-covariance. The cross-correlation is computed with fewer observations at larger lags and so it is common to truncate the function at a given level, which we do by only using the first T/4 lags, where T is the number of observations.

A correlation below 1.96/T1.96/\sqrt{T}1.96/T​ is considered statistically insignificant, thus we optionally cut off the lags at this level (the modifier sig-True means we only use the statistically significant values, and sig-False means we use all values). We take the two summary statistics of the correlogram: the maximum over the considered lags (denoted by modifier max), and the average over the considered lags (mean).

Cross Correlation Estimators
  • xcorr_max_sig-True

  • xcorr_mean_sig-True

  • xcorr_mean_sig-False

Squared Cross Correlation Estimators
  • xcorr-sq_max_sig-True

  • xcorr-sq_mean_sig-True

  • xcorr-sq_mean_sig-False


Kendall's Rank Correlation Coefficient

Base Identifier: kendalltau

Kendall's Rank Correlation Coefficient Estimator
  • kendalltau


Precision

Base Identifier: prec

The precision matrix is the matrix inverse of the covariance matrix, and can be used to quantify the association between each pair of time series while controlling for concomitant effects of all other time series. For normalised time-series data, the precision matrix is equivalent to the partial correlation between each pairwise time series, conditioned on all other time series, and is within [−1, 1]. The precision matrix is computed via the same module as the covariance matrix (in scikit-learn), and has the same estimators.

Precision Estimators
  1. prec_EmpiricalCovariance

  2. prec_EllipticEnvelope

  3. prec_MinCovDet

  4. prec_GraphicalLasso

  5. prec_GraphicalLassoCV

  6. prec_ShrunkCovariance

  7. prec_LedoitWolf

  8. prec_OAS


Spearman's Rank-Correlation Coefficient

Base Identifier: spearmanr

Spearman's Rank-Correlation Coefficient Estimator
  • spearmanr


: undirected, linear, signed/unsigned, bivariate, time-dependent.

: undirected, nonlinear, signed, bivariate, contemporaneous.

Key Reference:

Kendall’s τ assesses the association of ordinal variables, similar to Spearman’s ρ, but has certain differences, such as becoming more mathematically tractable in the event of ties. The method is implemented via function in SciPy, and has a value in [−1, 1].

: undirected, linear, signed, multivariate, contemporaneous.

: undirected, nonlinear, signed, bivariate, contemporaneous.

Spearman’s ρ is a nonparametric measure of rank correlation between variables. The use of ordinal (ranked) variables allows the statistic to capture non-linear (but monotonic) relationships between random variables. The method is implemented via function in , and has a value in [−1, 1].

[1]
kendalltau
spearmanr
SciPy
EmpiricalCovariance
EllipticEnvelope
MinCovDet
GraphicalLasso
GraphicalLassoCV
ShrunkCovariance
LedoitWolf
OAS
Page cover image
Keywords
Keywords
Keywords
Keywords
Keywords