pyspi: Statistics for Pairwise Interactions
pyspi GitHub
  • Welcome to pyspi
    • Citing pyspi
  • Installing and using pyspi
    • Installation
      • Alternative Installation Options
      • Troubleshooting
    • Usage
      • Walkthrough Tutorials
        • Getting Started: A Simple Demonstration
        • Neuroimaging: fMRI Time Series
        • Finance: Stock Price Time Series
        • Distributing Calculations
      • Advanced Usage
        • Creating a reduced SPI set
        • Distributing calculations on a cluster
      • FAQ
  • Information about pyspi
    • SPIs
      • Glossary of Terms
      • Table of SPIs
      • SPI Descriptions
        • Basic Statistics
        • Distance Similarity
        • Causal Inference
        • Information Theory
        • Spectral
        • Miscellaneous
      • SPI Subsets
    • API Reference
      • pyspi.calculator.CorrelationFrame
      • pyspi.calculator.Calculator
      • pyspi.data.Data
      • pyspi.calculator.CalculatorFrame
      • pyspi.utils.filter_spis
    • Publications using pyspi
    • Related Packages
  • Development
    • Development
      • Incorporating new SPIs
      • Contributing to pyspi
      • Code of Conduct
    • License
Powered by GitBook

All page cover images on this wiki are created with the help of DALL-E, an AI program developed by OpenAI, or stock images from Unsplash.

On this page
  1. Information about pyspi
  2. SPIs
  3. SPI Descriptions

Basic Statistics

Basic SPIs Overview

In this section, we detail the SPIs that we have categorised as 'basic statistics'. These include SPIs that are foundational to statistical analysis, often because they are widely applicable, relatively straightforward to understand and compute, and can form the basis for more complex methods.


Covariance

Keywords: undirected, nonlinear, unsigned, bivariate, contemporaneous.

Base Identifier: cov

The covariance matrix is estimated for a wide variety of statistical procedures. Due to the z-scoring of the time series, the correlation and covariance matrices are equivalent and thus the covariance statistic is within [−1, 1]. We use scikit-learn to compute the covariance matrix via a number of estimators:

  • Standard maximum likelihood estimate (MLE) (denoted by the modifier EmpiricalCovariance).

  • Elliptic envelope (EllipticEnvelope).

  • Minimum covariance determinant (MinCovDet) methods for outlier removal.

  • Lasso technique, which uses an l1-regularisation to sparsify the covariance matrix (GraphicalLasso).

    • A method with the regularisation method chosen through cross-validation with five splits, (GraphicalLassoCV).

  • Basic shrinkage covariance estimator with a fixed shrinkage coefficient of 0.1 (ShrunkCovariance).

  • The Ledoit-Wolf method for optimising the shrinkage coefficient (LedoitWolf).

  • Oracle approximating shrinkage, an improved method for optimising the shrinkage coefficient if the data are Gaussian (OAS).

Covariance Estimators
  • cov_EmpiricalCovariance

  • cov_EllipticEnvelope

  • cov_MinCovDet

  • cov_GraphicalLasso

  • cov_GraphicalLassoCV

  • cov_ShrunkCovariance

  • cov_LedoitWolf

  • cov_OAS


Cross Correlation

Keywords: undirected, linear, signed/unsigned, bivariate, time-dependent.

Base Identifier: xcorr

The cross-correlation function is defined as the Pearson correlation between two time series for all lags, giving values in [−1, 1] for each lag. To estimate the cross-correlation function, we use SciPy, which outputs a correlogram, i.e., the correlation from the MLE of the cross-covariance at a given lag, normalised by the auto-covariance. The cross-correlation is computed with fewer observations at larger lags and so it is common to truncate the function at a given level, which we do by only using the first T/4 lags, where T is the number of observations.

A correlation below 1.96/T1.96/\sqrt{T}1.96/T​ is considered statistically insignificant, thus we optionally cut off the lags at this level (the modifier sig-True means we only use the statistically significant values, and sig-False means we use all values). We take the two summary statistics of the correlogram: the maximum over the considered lags (denoted by modifier max), and the average over the considered lags (mean).

Cross Correlation Estimators
  • xcorr_max_sig-True

  • xcorr_mean_sig-True

  • xcorr_mean_sig-False

Squared Cross Correlation Estimators
  • xcorr-sq_max_sig-True

  • xcorr-sq_mean_sig-True

  • xcorr-sq_mean_sig-False


Kendall's Rank Correlation Coefficient

Keywords: undirected, nonlinear, signed, bivariate, contemporaneous.

Base Identifier: kendalltau

Key Reference: [1]

Kendall’s τ assesses the association of ordinal variables, similar to Spearman’s ρ, but has certain differences, such as becoming more mathematically tractable in the event of ties. The method is implemented via function kendalltau in SciPy, and has a value in [−1, 1].

Kendall's Rank Correlation Coefficient Estimator
  • kendalltau


Precision

Keywords: undirected, linear, signed, multivariate, contemporaneous.

Base Identifier: prec

The precision matrix is the matrix inverse of the covariance matrix, and can be used to quantify the association between each pair of time series while controlling for concomitant effects of all other time series. For normalised time-series data, the precision matrix is equivalent to the partial correlation between each pairwise time series, conditioned on all other time series, and is within [−1, 1]. The precision matrix is computed via the same module as the covariance matrix (in scikit-learn), and has the same estimators.

Precision Estimators
  1. prec_EmpiricalCovariance

  2. prec_EllipticEnvelope

  3. prec_MinCovDet

  4. prec_GraphicalLasso

  5. prec_GraphicalLassoCV

  6. prec_ShrunkCovariance

  7. prec_LedoitWolf

  8. prec_OAS


Spearman's Rank-Correlation Coefficient

Keywords: undirected, nonlinear, signed, bivariate, contemporaneous.

Base Identifier: spearmanr

Spearman’s ρ is a nonparametric measure of rank correlation between variables. The use of ordinal (ranked) variables allows the statistic to capture non-linear (but monotonic) relationships between random variables. The method is implemented via function spearmanr in SciPy, and has a value in [−1, 1].

Spearman's Rank-Correlation Coefficient Estimator
  • spearmanr


PreviousSPI DescriptionsNextDistance Similarity

Last updated 1 month ago

Page cover image