Getting Started: A Simple Demonstration
Last updated
Last updated
Now that you have installed pyspi, let's get started with a very simple demonstration of how we would apply it to a dataset of interest. Here we will generate a generic dataset from a multivariate Gaussian distribution:
Trial run with a reduced set:
As good practice, we always recommend doing a trial run of pyspi with a smaller subset of SPIs first (e.g., the , , or sets). In this way, we can run pyspi on our data quickly, identifying and resolving any potential issues before proceeding with more computationally intensive calculations. Once you are satisfied with the analysis pipeline, you can always scale up to the full pyspi library of over 250 SPIs. Let's do a trial run with the 'fast' subset by instantiating the Calculator
object in the following way:
Note that by default, your dataset will be normalised before computing SPIs. If you would like to disable this pre-processing step, pass the normalise=False
flag to the Calculator object when instantiating. For more information, see the .
After successfully initialising the Calculator, you should see the summary of the pre-processing steps applied to the data for verification:
Once the calculator has computed each of the statistics, you can access all SPI outputs using the table property:
Using the full SPI set:
By default, pyspi will instantiate a Calculator with the full pyspi set (all SPIs). To access the full pyspi library of SPIs, simply call the Calculator as follows either without specifying a subset or passing subset = 'all':
Now that we have passed our dataset into the object, verified the pre-procesing steps we require, and specified which SPIs we would like to compute via the subset parameter, we can now compute all of our SPIs by simply calling the compute method:
Alternatively, if we would like to examine a specific method's outputs, we can extract its corresponding matrix of pairwise interactions () by specifying its unique identifier. For instance, the following code will extract the covariance matrix computed with the maximum likelihood estimator:
While we tried to make the calculator as efficient as possible, computing all statistics can take a while (depending on the size of your dataset). To give users a sense of how long the full pyspi set takes to run, see the section.
Given the intensive nature of calculations with the full SPI set, you might want to explore options for distributed computing, particularly if you are working with large datasets. For further details on how you can get started with running pyspi on a HPC cluster, refer to the .
In this simple tutorial, we have demonstrated how you can use reduced subsets of SPIs in your analysis. Subsets such as '', '' or '' offer a streamlined way to ensure your pipeline is working as expected, without overwhelming computational demands. However there might be scenarios where these predefined subsets don't align with your specific needs or objectives. In such cases, creating a custom subset of SPIs may be highly advantageous.
If you would like to construct your own reduced subset of SPIs, follow the guide in the to get started. We also recommend checking the and the to help guide your selection of SPIs when creating your own custom subsets.