pyspi: Statistics for Pairwise Interactions
pyspi GitHub
  • Welcome to pyspi
    • Citing pyspi
  • Installing and using pyspi
    • Installation
      • Alternative Installation Options
      • Troubleshooting
    • Usage
      • Walkthrough Tutorials
        • Getting Started: A Simple Demonstration
        • Neuroimaging: fMRI Time Series
        • Finance: Stock Price Time Series
        • Distributing Calculations
      • Advanced Usage
        • Creating a reduced SPI set
        • Distributing calculations on a cluster
      • FAQ
  • Information about pyspi
    • SPIs
      • Glossary of Terms
      • Table of SPIs
      • SPI Descriptions
        • Basic Statistics
        • Distance Similarity
        • Causal Inference
        • Information Theory
        • Spectral
        • Miscellaneous
      • SPI Subsets
    • API Reference
      • pyspi.calculator.CorrelationFrame
      • pyspi.calculator.Calculator
      • pyspi.data.Data
      • pyspi.calculator.CalculatorFrame
      • pyspi.utils.filter_spis
    • Publications using pyspi
    • Related Packages
  • Development
    • Development
      • Incorporating new SPIs
      • Contributing to pyspi
      • Code of Conduct
    • License
Powered by GitBook

All page cover images on this wiki are created with the help of DALL-E, an AI program developed by OpenAI, or stock images from Unsplash.

On this page
  1. Information about pyspi
  2. SPIs
  3. SPI Descriptions

Causal Inference

PreviousDistance SimilarityNextInformation Theory

Last updated 9 months ago

Causal Inference SPIs Overview

These statistics aim to establish directed independence from bivariate observations, typically by making assumptions about the underlying model. We use two packages:

  • For convergent cross-mapping, we use the (pyEDM) package.

  • For all other SPIs, we use v0.5.23 of the (cdt).


Additive Noise Model

: directed, nonlinear, unsigned, bivariate, contemporaneous.

Base Identifier: anm

Key References:

Additive noise models are used for hypothesis testing directed nonlinear dependence (or causality) of x → y by making the assumption that the effect variable, y, is a function of a cause variable, x, plus a noise term (that is independent of the cause). In this framework we use the statistic from cdt as our SPI, which is computed by first predicting y from x via a Gaussian process (with a radial basis function kernel), and then computing the normalized HSIC test statistic from the residuals.

Additive Noise Estimator
  • anm (ANM)


Conditional Distribution Similarity Fit

: directed, nonlinear, unsigned, bivariate, contemporaneous.

Base Identifier: cds

Key References:

The conditional distribution similarity fit is the standard deviation of the conditional probability distribution of y given x, where the distributions are estimated by discretizing the values.

Conditional Distribution Similarity Fit Estimator
  • cds


Convergent Cross-Mapping

Base Identifier: ccm

We use both fixed embedding dimensions (with dimension 1 and 10, indicated by modifiers E-1 and E-10, respectively) and an inferred embedding dimension from univariate phase-space reconstruction methods (modifier E-None). Following the Supplementary Materials of the original CCM paper and the documentation in the pyEDM package, we infer the embedding dimension to be the maximum of the two univariate delay embeddings that best predicted each time series. Given a fixed or inferred embedding dimension, we have an upper and lower bound on the minimum and maximum library size that can be used for computing CCM. In this work we use 21 uniformly sampled library sizes between this minimum and maximum to generate the CCM curves. Once the curve (prediction quality as a function of library size) is obtained, we take summary statistics of the mean (modifier mean), maximum (max) and difference (diff) across the curves. We do not explicitly measure convergence of the algorithm as a function of library size, consistent with common practice in the literature, but note that this differs from the original theory; no automatic algorithm or heuristic was originally proposed for quantifying convergence.

Convergent Cross-Mapping Estimators
  • ccm_E-1_mean

  • ccm_E-1_max

  • ccm_E-1_diff

  • ccm_E-10_mean

  • ccm_E-10_max

  • ccm_E-10_diff

  • ccm_E-None_mean

  • ccm_E-None_max

  • ccm_E-None_diff


Information-Geometric Conditional Independence

Base Identifier: igci

Information-geometric conditional independence is a method for inferring causal influence from x → y for deterministic systems with invertible functions. The statistic is computed using cdt as the difference in differential entropies where the probability density is computed via nearest-neighbor estimators.

Information-Geometric Causal Inference Estimator
  • igci


Regression Error-Based Causal Inference

Base Identifier: reci

The regression error-based causal inference method is an estimate of the causal effect of x → y by quantifying the error in a regression of y on x with a monomial (power product) model. In the bivariate case, this statistic is the MSE of the linear regression of the cubic (plus constant) of x with y.

Regression Error-Based Causal Inference Estimator
  • reci


: directed, nonlinear, unsigned, bivariate, time-dependent.

Key References:

The idea behind convergent cross-mapping (CCM) is that there is a causal influence from time series x → y if the Takens time-delay embedding of y can be used to predict the observations of x. The algorithm quantifies the prediction error (in terms of Pearson’s ρ) of time series x from the delay embedding of time series y for increasing library sizes (i.e., time series length being used in the predictions). If, as the library size increases, the correlation converges and is higher in one direction than the other, there is an inferred causal link. The results of CCM are typically represented as two curves; one for each causal direction (x → y and y → x) with the library size on the horizontal axis and the prediction quality (correlation) on the vertical axis. We use the to compute CCM, which requires an embedding dimension to be set for the delay embedding of each time series.

: directed, nonlinear, unsigned, bivariate, contemporaneous.

Key References:

: directed, nonlinear, unsigned, bivariate, contemporaneous.

Key References:

[1]
pyEDM package
[1]
[1]
Empirical Dynamic Modeling
Causal Discovery Toolbox
[1]
[1]
Page cover image
Keywords
Keywords
Keywords
Keywords
Keywords