FAQ
These FAQs aim to cover the basic questions new users might have when using pyspi.
Last updated
These FAQs aim to cover the basic questions new users might have when using pyspi.
Last updated
How many SPIs should I measure for my dataset?
When starting out, we recommend that users work with a smaller subset of available SPIs first, so they get a sense of computation times and working with the output in a lower-dimensional space. Users have the option to pass in a customised configuration .yaml file as described in the
Alternatively, we provide two pre-defined subsets of SPIs that can serve as good starting points: sonnet and fast. The sonnet subset includes 14 SPIs selected to represent the 14 modules identified through hierarchical clustering in the . To retain as many SPIs as possible while minimising computation time, we also offer a fast option that omits the most computationally expensive SPIs. Either SPI subset can be toggled by setting the corresponding flag in the () function call as follows:
What pre-processing steps are applied to my data?
There are two pre-processing steps that can be applied to your raw multivariate time series (MTS) dataset before computing SPIs:
(1) Detrend: Detrend each time series in the dataset individually along the time dimension using the with default settings. If enabled, detrending is always applied to the dataset before z-score normalisation.
(2) Z-score normalise: Normalise each time series in the dataset individually along the time dimension using the.
By default, when instantiating a object with your dataset, pyspi will normalise each time series — representing a process in a MTS dataset — individually along the time axis.
If you would to specify which pre-processing steps to include/exclude, you can pass the corresponding flags for each operation when instantiating a . Here are some examples of how you can skip either or both operations:
After successfully instantiating a Calculator object, a summary of the pre-processing steps will be displayed for verification before computing SPIs. Here is an example of the output when explicitly setting the detrending step to False
:
How long does pyspi take to run?
This depends on the size of your multivariate time series (MTS) data – both the number of processes and the number of time points (observations). In general, we recommend that users try running pyspi first with a small representative sample from their dataset to assess time and computing requirements, and scaling up accordingly. The amount of time also depends on the feature set you’re using – whether it’s the full set of all SPIs or a reduced set (like sonnet or fast described above).
To give users a sense of how long pyspi takes to run, we ran a series of experiments on a high-performing computing cluster with 2 cores, 2 MPI, and 40GB memory. We ran pyspi on simulated NumPy arrays with either a fixed number of processes (2) or fixed number of time points (100) to see how timing scales with the array size. Here are the results:
We note that computation times for the sonnet and fast subset are roughly equivalent, and the full set of SPIs requires increasingly large amounts of time to compute with increasing time series lengths. The computation time for the full set of SPIs increases with a consistent slope to that of the sonnet and fast subsets with increasing number of processes (right).
Here are the timing values for each condition, which can help users estimate the computation time requirements for their dataset:
How can I contribute to pyspi?
Contributions play a vital role in the continual development and enhancement of pyspi, a project built and enriched through community collaboration. By participating in this project, you are contributing to the broader community and helping shape the future of this package.
Do I need to normalise my dataset before applying pyspi?
If you do not wish for pyspi to z-score your data, or you would like more control over how your data is pre-processed, you can pass the normalise=False
flag to the Calculator
when instantiating.
Can I distribute pyspi calculations across a cluster?
How can I cite pyspi in my work?
Can I run pyspi on my operating system?
pyspi is designed with cross-platform compatibility in mind and can be run on various operating systems, ensuring a wide range of users have access to pyspi and all of it features. Specifically, pyspi currently supports:
MacOS (Python >= 3.9)
Windows (Python >=3.8)
Linux (Python >=3.8)
Are there examples showcasing a complete pipeline using pyspi?
How can I save my results from pyspi?
Saving a calculator
Loading a calculator
Code is not the only way to contribute to pyspi. Reviewing pull requests, answering questions to help others and aid in troubleshooting, organising and teaching tutorials and improving documentation are all priceless contributions to the project. For further details on how you can contribute to the project, as well as general guidelines for our contributors, please refer to .
When passing your dataset into the Calculator object, pyspi will automatically z-score (normalise) along the time axis by default (see for Data object). This means that you can supply raw values to the Calculator object without having to normalise the dataset as a pre-processing step.
If you have access to a portable batch system (PBS) cluster and are processing MTS with many processes (or are analysing many MTS), then you may find the repository helpful. Each job contains one calculator object that is associated with one MTS. To get started with running pyspi jobs on a PBS-type cluster, follow our guide located .
If you used pyspi in your work, it would be greatly appreciated if you . Feel free to star our if you find our package useful, as this also helps to increase awareness in the time-series analysis community.
In all cases, ensure that you have the required version of Python installed, as pyspi is a python-based package. We actively monitor and work on compatibility issues that may arise with new updates to these operating systems. Users are encouraged to report any compatibility issues they encounter on our , helping us improve pyspi for all users.
Yes, we currently provide two notebooks with examples of complete pipelines using pyspi. These notebooks are available in the section. If you want to share a notebook with additional pipelines or specific use cases, please feel free to contact us.
Once you have computed the SPIs for your dataset, the results will be stored in the calculator object. We recommend saving the calculator as a . file using the . To get started, you will need to install the dill package: pip install dill
.