Getting Started: A Simple Demonstration
Simple Demonstration
Now that you have installed pyspi, let's get started with a very simple demonstration of how we would apply it to a dataset of interest. Here we will generate a generic dataset from a multivariate Gaussian distribution:
Trial run with a reduced set:
As good practice, we always recommend doing a trial run of pyspi with a smaller subset of SPIs first (e.g., the fabfour, sonnet, or fast sets). In this way, we can run pyspi on our data quickly, identifying and resolving any potential issues before proceeding with more computationally intensive calculations. Once you are satisfied with the analysis pipeline, you can always scale up to the full pyspi library of over 250 SPIs. Let's do a trial run with the 'fast' subset by instantiating the Calculator
object in the following way:
Now that we have passed our dataset into the Calculator object and specified which SPIs we would like to compute via the subset parameter, we can now compute all of our SPIs in one fell swoop by simply calling the compute method:
Once the calculator has computed each of the statistics, you can access all SPI outputs using the table property:
Alternatively, if we would like to examine a specific method's outputs, we can extract its corresponding matrix of pairwise interactions (MPI) by specifying its unique identifier. For instance, the following code will extract the covariance matrix computed with the maximum likelihood estimator:
Using the full SPI set:
By default, pyspi will instantiate a Calculator with the full pyspi set (all SPIs). To access the full pyspi library of SPIs, simply call the Calculator as follows either without specifying a subset or passing subset = 'all':
While we tried to make the calculator as efficient as possible, computing all statistics can take a while (depending on the size of your dataset). To give users a sense of how long the full pyspi set takes to run, see the FAQ section.
Distributed Computing
Given the intensive nature of calculations with the full SPI set, you might want to explore options for distributed computing, particularly if you are working with large datasets. For further details on how you can get started with running pyspi on a HPC cluster, refer to the Advanced Usage section.
Bonus: Construct a custom SPI subset
In this simple tutorial, we have demonstrated how you can use reduced subsets of SPIs in your analysis. Subsets such as 'fabfour', 'sonnet' or 'fast' offer a streamlined way to ensure your pipeline is working as expected, without overwhelming computational demands. However there might be scenarios where these predefined subsets don't align with your specific needs or objectives. In such cases, creating a custom subset of SPIs may be highly advantageous.
If you would like to construct your own reduced subset of SPIs, follow the guide in the Advanced Usage section to get started. We also recommend checking the table of SPIs and the detailed SPI descriptions to help guide your selection of SPIs when creating your own custom subsets.
Last updated