pyspi: Statistics for Pairwise Interactions
pyspi GitHub
  • Welcome to pyspi
    • Citing pyspi
  • Installing and using pyspi
    • Installation
      • Alternative Installation Options
      • Troubleshooting
    • Usage
      • Walkthrough Tutorials
        • Getting Started: A Simple Demonstration
        • Neuroimaging: fMRI Time Series
        • Finance: Stock Price Time Series
        • Distributing Calculations
      • Advanced Usage
        • Creating a reduced SPI set
        • Distributing calculations on a cluster
      • FAQ
  • Information about pyspi
    • SPIs
      • Glossary of Terms
      • Table of SPIs
      • SPI Descriptions
        • Basic Statistics
        • Distance Similarity
        • Causal Inference
        • Information Theory
        • Spectral
        • Miscellaneous
      • SPI Subsets
    • API Reference
      • pyspi.calculator.CorrelationFrame
      • pyspi.calculator.Calculator
      • pyspi.data.Data
      • pyspi.calculator.CalculatorFrame
      • pyspi.utils.filter_spis
    • Publications using pyspi
    • Related Packages
  • Development
    • Development
      • Incorporating new SPIs
      • Contributing to pyspi
      • Code of Conduct
    • License
Powered by GitBook

All page cover images on this wiki are created with the help of DALL-E, an AI program developed by OpenAI, or stock images from Unsplash.

On this page
  1. Information about pyspi
  2. SPIs
  3. SPI Descriptions

Causal Inference

Causal Inference SPIs Overview

These statistics aim to establish directed independence from bivariate observations, typically by making assumptions about the underlying model. We use two packages:

  • For convergent cross-mapping, we use the Empirical Dynamic Modeling (pyEDM) package.

  • For all other SPIs, we use v0.5.23 of the Causal Discovery Toolbox (cdt).


Additive Noise Model

Keywords: directed, nonlinear, unsigned, bivariate, contemporaneous.

Base Identifier: anm

Key References: [1]

Additive noise models are used for hypothesis testing directed nonlinear dependence (or causality) of x → y by making the assumption that the effect variable, y, is a function of a cause variable, x, plus a noise term (that is independent of the cause). In this framework we use the statistic from cdt as our SPI, which is computed by first predicting y from x via a Gaussian process (with a radial basis function kernel), and then computing the normalized HSIC test statistic from the residuals.

Additive Noise Estimator
  • anm (ANM)


Conditional Distribution Similarity Fit

Keywords: directed, nonlinear, unsigned, bivariate, contemporaneous.

Base Identifier: cds

Key References: [1]

The conditional distribution similarity fit is the standard deviation of the conditional probability distribution of y given x, where the distributions are estimated by discretizing the values.

Conditional Distribution Similarity Fit Estimator
  • cds


Convergent Cross-Mapping

Keywords: directed, nonlinear, unsigned, bivariate, time-dependent.

Base Identifier: ccm

Key References: [1]

The idea behind convergent cross-mapping (CCM) is that there is a causal influence from time series x → y if the Takens time-delay embedding of y can be used to predict the observations of x. The algorithm quantifies the prediction error (in terms of Pearson’s ρ) of time series x from the delay embedding of time series y for increasing library sizes (i.e., time series length being used in the predictions). If, as the library size increases, the correlation converges and is higher in one direction than the other, there is an inferred causal link. The results of CCM are typically represented as two curves; one for each causal direction (x → y and y → x) with the library size on the horizontal axis and the prediction quality (correlation) on the vertical axis. We use the pyEDM package to compute CCM, which requires an embedding dimension to be set for the delay embedding of each time series.

We use both fixed embedding dimensions (with dimension 1 and 10, indicated by modifiers E-1 and E-10, respectively) and an inferred embedding dimension from univariate phase-space reconstruction methods (modifier E-None). Following the Supplementary Materials of the original CCM paper and the documentation in the pyEDM package, we infer the embedding dimension to be the maximum of the two univariate delay embeddings that best predicted each time series. Given a fixed or inferred embedding dimension, we have an upper and lower bound on the minimum and maximum library size that can be used for computing CCM. In this work we use 21 uniformly sampled library sizes between this minimum and maximum to generate the CCM curves. Once the curve (prediction quality as a function of library size) is obtained, we take summary statistics of the mean (modifier mean), maximum (max) and difference (diff) across the curves. We do not explicitly measure convergence of the algorithm as a function of library size, consistent with common practice in the literature, but note that this differs from the original theory; no automatic algorithm or heuristic was originally proposed for quantifying convergence.

Convergent Cross-Mapping Estimators
  • ccm_E-1_mean

  • ccm_E-1_max

  • ccm_E-1_diff

  • ccm_E-10_mean

  • ccm_E-10_max

  • ccm_E-10_diff

  • ccm_E-None_mean

  • ccm_E-None_max

  • ccm_E-None_diff


Information-Geometric Conditional Independence

Keywords: directed, nonlinear, unsigned, bivariate, contemporaneous.

Base Identifier: igci

Key References: [1]

Information-geometric conditional independence is a method for inferring causal influence from x → y for deterministic systems with invertible functions. The statistic is computed using cdt as the difference in differential entropies where the probability density is computed via nearest-neighbor estimators.

Information-Geometric Causal Inference Estimator
  • igci


Regression Error-Based Causal Inference

Keywords: directed, nonlinear, unsigned, bivariate, contemporaneous.

Base Identifier: reci

Key References: [1]

The regression error-based causal inference method is an estimate of the causal effect of x → y by quantifying the error in a regression of y on x with a monomial (power product) model. In the bivariate case, this statistic is the MSE of the linear regression of the cubic (plus constant) of x with y.

Regression Error-Based Causal Inference Estimator
  • reci


PreviousDistance SimilarityNextInformation Theory

Last updated 10 months ago

Page cover image