Getting Started: A Simple Demonstration
Simple Demonstration
Now that you have installed pyspi, let's get started with a very simple demonstration of how we would apply it to a dataset of interest. Here we will generate a generic dataset from a multivariate Gaussian distribution:
import numpy as np
import random
random.seed(42)
M = 5 # 5 independent processes
T = 500 # 500 samples per process
dataset = np.random.randn(M,T) # generate our multivariate time series
Trial run with a reduced set:
As good practice, we always recommend doing a trial run of pyspi with a smaller subset of SPIs first (e.g., the fabfour, sonnet, or fast sets). In this way, we can run pyspi on our data quickly, identifying and resolving any potential issues before proceeding with more computationally intensive calculations. Once you are satisfied with the analysis pipeline, you can always scale up to the full pyspi library of over 250 SPIs. Let's do a trial run with the 'fast' subset by instantiating the Calculator
object in the following way:
from pyspi.calculator import Calculator
calc = Calculator(dataset=dataset, subset='fast') # instantiate the calculator object
Note that by default, your dataset will be normalised before computing SPIs. If you would like to disable this pre-processing step, pass the normalise=False
flag to the Calculator object when instantiating. For more information, see the API docs.
calc = Calculator(dataset=dataset, subset='fast', normalise=False)
After successfully initialising the Calculator, you should see the summary of the pre-processing steps applied to the data for verification:
216 SPI(s) were successfully initialised.
[1/2] Skipping detrending of the dataset...
[2/2] Normalising (z-scoring) the dataset...
Now that we have passed our dataset into the Calculator object, verified the pre-procesing steps we require, and specified which SPIs we would like to compute via the subset parameter, we can now compute all of our SPIs by simply calling the compute method:
calc.compute()
Once the calculator has computed each of the statistics, you can access all SPI outputs using the table property:
print(calc.table)
Alternatively, if we would like to examine a specific method's outputs, we can extract its corresponding matrix of pairwise interactions (MPI) by specifying its unique identifier. For instance, the following code will extract the covariance matrix computed with the maximum likelihood estimator:
print(calc.table['cov_EmpiricalCovariance'])
Using the full SPI set:
By default, pyspi will instantiate a Calculator with the full pyspi set (all SPIs). To access the full pyspi library of SPIs, simply call the Calculator as follows either without specifying a subset or passing subset = 'all':
calc = Calculator(dataset=dataset) # use all SPIs
# alterantively, one can specify the subset 'all'
calc = Calculator(dataset=dataset, subset='all')
# compute the SPIs as usual
calc.compute()
While we tried to make the calculator as efficient as possible, computing all statistics can take a while (depending on the size of your dataset). To give users a sense of how long the full pyspi set takes to run, see the FAQ section.
Bonus: Construct a custom SPI subset
In this simple tutorial, we have demonstrated how you can use reduced subsets of SPIs in your analysis. Subsets such as 'fabfour', 'sonnet' or 'fast' offer a streamlined way to ensure your pipeline is working as expected, without overwhelming computational demands. However there might be scenarios where these predefined subsets don't align with your specific needs or objectives. In such cases, creating a custom subset of SPIs may be highly advantageous.
If you would like to construct your own reduced subset of SPIs, follow the guide in the Advanced Usage section to get started. We also recommend checking the table of SPIs and the detailed SPI descriptions to help guide your selection of SPIs when creating your own custom subsets.
Last updated