Information Theory
Information Theoretic SPIs Overview
The pairwise measures that we employ from information theory are either intended to operate on seri- ally independent observations (e.g., joint entropy and mutual information) or bivariate time series (e.g., transfer entropy and stochastic interaction). We primarily use v1.6.1 of the Java Information Dynamics Toolkit in this section, which allows us to compute differential entropy, mutual information, and transfer entropy, in order to construct many information-theoretic measures. A density estimation is required to compute information-theoretic measures, and in this work we use four different estimators:
The Gaussian-distribution model (denoted by modifier
gaussian
) assumes a linear-Gaussian multivariate, where the measure is derived from the cross-covariance matrix.Kernel estimation (kernel) uses a box kernel method with a specific kernel width (default width of 0.5 standard deviations (denoted by the modifier
W-0.5
).The Kozachenko–Leonenko technique (
kozachenko
) is a nearest-neighbour approach that is suitable when measures can only be constructed using entropy or joint entropy (not mutual information).The Kraskov–Stögbauer-Grassberger (KSG) technique (
ksg
) combines nearest-neighbour estimators for mutual information based measures (default of four nearest neighbours, indicated by modifierNN-4
). The KSG estimator is effectively a combination of multiple Kozachenko estimators that includes techniques to remove a bias that is introduced by taking the difference between two differential entropy estimates.
An alternative approach to using the continuous estimators above would be to discretise each observation (e.g., through binning) and use a discrete estimator; however, discrete estimators are known to be heavily dependent on the discretisation size.
Density estimates for mutual information (and related measures) are biased by the autocorrelation present in individuals signals. A common solution to reduce bias in nonlinear estimators (like the KSG technique) is to use dynamic correlation exclusion (also known as a Theiler window), which excludes any data points within a given time window from the density estimate and avoids oversampling. The window size should be large enough to render observations included in the density estimate uncorrelated (sometimes called the “autocorrelation time”); here, we set the window as the product of the autocorrelation functions of both time series (a heuristic for the autocorrelation time based on Bartlett’s formula). The DCE
modifier indicates that the Theiler window is used for the KSG estimator.
Causally Conditioned EntropyKeywords: directed, nonlinear, unsigned, bivariate, time-dependent.
Base Identifier:
cce
Causally conditioned entropy aims to measure the uncertainty remaining in time series y in the context of the entire (causal) past of both time series x and y. The measure is computed as a sum of conditional entropies (of y given both the past of x and the past of y) with increasing history lengths. The standard assumption is that we consider the entire past of both time series (i.e., there are T −1 conditional entropies in the sum, from 1 to T − 1); however, for computational reasons, we restrict the history length to 10. This corresponds to the assumption that the joint process is, at maximum, a 10th-order Markov chain.
Conditional EntropyKeywords: undirected, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier:
ce
Key References: [1]
Conditional entropy quantifies the uncertainty over the observations in y in the context of simultaneously observing x.
Directed InformationKeywords: directed, nonlinear, unsigned, bivariate, time-dependent.
Base Identifier:
di
Directed information is a measure of information flow from a source time series x to a target time series y that is related to transfer entropy, but has no time lag between source and target. The directed information can be computed as a difference between the conditional entropy of y given its own past, and causally conditioned entropy. For the same reasons as for causally conditioned entropy, we restrict its computation up to a history length of 10.
Granger CausalityKeywords: directed, linear, unsigned, bivariate, time-dependent.
Base Identifier:
gc
Key References: [1]
Granger causality is obtained by assessing directed dependence of x → y as the predictive power of a bivariate autoregressive model (comprising x and y) over the univariate autoregressive model (with y only). The statistic included in our framework is the log-ratio of residual variance for the two models, which is a variant of Granger causality popularised by Geweke later found to be equivalent to transfer entropy with a Gaussian estimator. We compute the time delay and embedding dimension in the same way as transfer entropy, allowing for both fixed and optimised embedding lengths (k
and l
) and a time delays (kt
and lt
), where the optimisation procedure is identical to transfer entropy.
Integrated InformationKeywords: undirected, nonlinear, unsigned, bivariate, time-dependent.
Base Identifier:
phi
Integrated information was proposed to capture some aspects of consciousness as part of Integrated Information Theory (IIT). We implement two proxy measures for IIT 2.0 from the PhiToolbox:
Both measures are optionally normalised (divided) by entropy (denoted by the modifier norm
).
Joint EntropyKeywords: undirected, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier:
je
Key References: [1]
The joint entropy quantifies the uncertainty over the paired observations.
Mutual InformationKeywords: undirected, nonlinear, unsigned, bivariate, contemporaneous.
Base Identifier:
mi
Key References: [1]
Mutual information is an undirected measure of the (potentially nonlinear) dependence between paired observations of x and y.
Stochastic InteractionKeywords: undirected, nonlinear, unsigned, bivariate, time-dependent.
Base Identifier:
si
Stochastic interaction is a measure of integrated information between two processes in the context of their own past. It is quantified by the difference between the joint entropy of the bivariate process and the individual entropies of each univariate process. Both entropies are measured in context of (conditioned on) the history of the processes; we restrict this history to be only one-step, assuming a first-order Markov process.
Transfer EntropyKeywords: directed, nonlinear, unsigned, bivariate, time-dependent.
Base Identifier:
te
Key References: [1]
Transfer entropy is a measure of information transfer from a source time series x
to a target time series y
, based on the Takens time-delay embedding. Delay embeddings capture the relevant history of a time series that can be used as a predictor of its future and are constructed from an embedding length and a time delay. The embedding lengths are denoted by modifier’s l
and k
for the source, x
, and target, y
, respectively; the time delay is denoted lt
and kt
for time series x
and y
. The embedding parameters can be obtained in a number of ways, so long as their product is (significantly) less than the number of observations. We compute transfer entropy for both fixed and optimized embedding parameters. The fixed parameters are minimal values, i.e., typically with a fixed target embedding length of 1 (denoted byk-1
) or 2 (denoted by k-2
).
The optimised parameters are inferred by choosing the embedding parameters (up to a maximum embedding length of 10, denoted by k-max-10
, and maximum time delay of 2, tau-max-2
) that maximise a univariate information-theoretic measure known as active information storage. Note that there is no inclusion of the gaussian estimator for transfer entropy, since this is equivalent to Granger causality. We also include a symbolic estimator for transfer entropy (denoted by the modifier symbolic
).
Time-lagged Mutual InformationKeywords: directed, nonlinear, unsigned, bivariate, time-dependent.
Base Identifier:
tlmi
Key References: [1]
Time-lagged mutual information is a directed measure of the dependence between time series x
and a time-lagged instance of the series y
. We include statistics for only lag-one mutual information, giving the following estimators:
Last updated