Causal Inference
Causal Inference SPIs Overview
These statistics aim to establish directed independence from bivariate observations, typically by making assumptions about the underlying model. We use two packages:
For convergent cross-mapping, we use the Empirical Dynamic Modeling (pyEDM) package.
For all other SPIs, we use v0.5.23 of the Causal Discovery Toolbox (cdt).
Additive Noise ModelKeywords: directed, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier:
anm
Key References: [1]
Additive noise models are used for hypothesis testing directed nonlinear dependence (or causality) of x → y by making the assumption that the effect variable, y, is a function of a cause variable, x, plus a noise term (that is independent of the cause). In this framework we use the statistic from cdt as our SPI, which is computed by first predicting y from x via a Gaussian process (with a radial basis function kernel), and then computing the normalized HSIC test statistic from the residuals.
Conditional Distribution Similarity FitKeywords: directed, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier:
cds
Key References: [1]
The conditional distribution similarity fit is the standard deviation of the conditional probability distribution of y given x, where the distributions are estimated by discretizing the values.
Convergent Cross-MappingKeywords: directed, nonlinear, unsigned, bivariate, time-dependent.
Base Identifier:
ccm
Key References: [1]
The idea behind convergent cross-mapping (CCM) is that there is a causal influence from time series x → y if the Takens time-delay embedding of y can be used to predict the observations of x. The algorithm quantifies the prediction error (in terms of Pearson’s ρ) of time series x from the delay embedding of time series y for increasing library sizes (i.e., time series length being used in the predictions). If, as the library size increases, the correlation converges and is higher in one direction than the other, there is an inferred causal link. The results of CCM are typically represented as two curves; one for each causal direction (x → y and y → x) with the library size on the horizontal axis and the prediction quality (correlation) on the vertical axis. We use the pyEDM package to compute CCM, which requires an embedding dimension to be set for the delay embedding of each time series.
We use both fixed embedding dimensions (with dimension 1 and 10, indicated by modifiers E-1
and E-10
, respectively) and an inferred embedding dimension from univariate phase-space reconstruction methods (modifier E-None
). Following the Supplementary Materials of the original CCM paper and the documentation in the pyEDM package, we infer the embedding dimension to be the maximum of the two univariate delay embeddings that best predicted each time series. Given a fixed or inferred embedding dimension, we have an upper and lower bound on the minimum and maximum library size that can be used for computing CCM. In this work we use 21 uniformly sampled library sizes between this minimum and maximum to generate the CCM curves. Once the curve (prediction quality as a function of library size) is obtained, we take summary statistics of the mean (modifier mean
), maximum (max
) and difference (diff
) across the curves. We do not explicitly measure convergence of the algorithm as a function of library size, consistent with common practice in the literature, but note that this differs from the original theory; no automatic algorithm or heuristic was originally proposed for quantifying convergence.
Information-Geometric Conditional IndependenceKeywords: directed, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier:
igci
Key References: [1]
Information-geometric conditional independence is a method for inferring causal influence from x → y for deterministic systems with invertible functions. The statistic is computed using cdt as the difference in differential entropies where the probability density is computed via nearest-neighbor estimators.
Regression Error-Based Causal InferenceKeywords: directed, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier:
reci
Key References: [1]
The regression error-based causal inference method is an estimate of the causal effect of x → y by quantifying the error in a regression of y on x with a monomial (power product) model. In the bivariate case, this statistic is the MSE of the linear regression of the cubic (plus constant) of x with y.
Last updated