pyspi: Statistics for Pairwise Interactions
pyspi GitHub
  • Welcome to pyspi
    • Citing pyspi
  • Installing and using pyspi
    • Installation
      • Alternative Installation Options
      • Troubleshooting
    • Usage
      • Walkthrough Tutorials
        • Getting Started: A Simple Demonstration
        • Neuroimaging: fMRI Time Series
        • Finance: Stock Price Time Series
        • Distributing Calculations
      • Advanced Usage
        • Creating a reduced SPI set
        • Distributing calculations on a cluster
      • FAQ
  • Information about pyspi
    • SPIs
      • Glossary of Terms
      • Table of SPIs
      • SPI Descriptions
        • Basic Statistics
        • Distance Similarity
        • Causal Inference
        • Information Theory
        • Spectral
        • Miscellaneous
      • SPI Subsets
    • API Reference
      • pyspi.calculator.CorrelationFrame
      • pyspi.calculator.Calculator
      • pyspi.data.Data
      • pyspi.calculator.CalculatorFrame
      • pyspi.utils.filter_spis
    • Publications using pyspi
    • Related Packages
  • Development
    • Development
      • Incorporating new SPIs
      • Contributing to pyspi
      • Code of Conduct
    • License
Powered by GitBook

All page cover images on this wiki are created with the help of DALL-E, an AI program developed by OpenAI, or stock images from Unsplash.

On this page
  • Simple Demonstration
  • Bonus: Construct a custom SPI subset
  1. Installing and using pyspi
  2. Usage
  3. Walkthrough Tutorials

Getting Started: A Simple Demonstration

PreviousWalkthrough TutorialsNextNeuroimaging: fMRI Time Series

Last updated 1 month ago

Simple Demonstration

Now that you have installed pyspi, let's get started with a very simple demonstration of how we would apply it to a dataset of interest. Here we will generate a generic dataset from a multivariate Gaussian distribution:

import numpy as np
import random

random.seed(42)

M = 5 # 5 independent processes
T = 500 # 500 samples per process

dataset = np.random.randn(M,T) # generate our multivariate time series

Trial run with a reduced set:

As good practice, we always recommend doing a trial run of pyspi with a smaller subset of SPIs first (e.g., the , , or sets). In this way, we can run pyspi on our data quickly, identifying and resolving any potential issues before proceeding with more computationally intensive calculations. Once you are satisfied with the analysis pipeline, you can always scale up to the full pyspi library of over 250 SPIs. Let's do a trial run with the 'fast' subset by instantiating the Calculator object in the following way:

from pyspi.calculator import Calculator

calc = Calculator(dataset=dataset, subset='fast') # instantiate the calculator object

Note that by default, your dataset will be normalised before computing SPIs. If you would like to disable this pre-processing step, pass the normalise=False flag to the Calculator object when instantiating. For more information, see the .

calc = Calculator(dataset=dataset, subset='fast', normalise=False)

After successfully initialising the Calculator, you should see the summary of the pre-processing steps applied to the data for verification:

216 SPI(s) were successfully initialised.

[1/2] Skipping detrending of the dataset...
[2/2] Normalising (z-scoring) the dataset...
calc.compute()

Once the calculator has computed each of the statistics, you can access all SPI outputs using the table property:

print(calc.table)
print(calc.table['cov_EmpiricalCovariance'])

Using the full SPI set:

By default, pyspi will instantiate a Calculator with the full pyspi set (all SPIs). To access the full pyspi library of SPIs, simply call the Calculator as follows either without specifying a subset or passing subset = 'all':

calc = Calculator(dataset=dataset) # use all SPIs
# alterantively, one can specify the subset 'all'
calc = Calculator(dataset=dataset, subset='all')

# compute the SPIs as usual
calc.compute()

Distributed Computing


Bonus: Construct a custom SPI subset

Now that we have passed our dataset into the object, verified the pre-procesing steps we require, and specified which SPIs we would like to compute via the subset parameter, we can now compute all of our SPIs by simply calling the compute method:

Alternatively, if we would like to examine a specific method's outputs, we can extract its corresponding matrix of pairwise interactions () by specifying its unique identifier. For instance, the following code will extract the covariance matrix computed with the maximum likelihood estimator:

While we tried to make the calculator as efficient as possible, computing all statistics can take a while (depending on the size of your dataset). To give users a sense of how long the full pyspi set takes to run, see the section.

Given the intensive nature of calculations with the full SPI set, you might want to explore options for distributed computing, particularly if you are working with large datasets. For further details on how you can get started with running pyspi on a HPC cluster, refer to the .

In this simple tutorial, we have demonstrated how you can use reduced subsets of SPIs in your analysis. Subsets such as '', '' or '' offer a streamlined way to ensure your pipeline is working as expected, without overwhelming computational demands. However there might be scenarios where these predefined subsets don't align with your specific needs or objectives. In such cases, creating a custom subset of SPIs may be highly advantageous.

If you would like to construct your own reduced subset of SPIs, follow the guide in the to get started. We also recommend checking the and the to help guide your selection of SPIs when creating your own custom subsets.

fabfour
sonnet
fast
API docs
Calculator
MPI
FAQ
Advanced Usage section
fabfour
sonnet
fast
Advanced Usage section
table of SPIs
detailed SPI descriptions
Page cover image