Neuroimaging: fMRI Time Series
In this tutorial, we will be applying pyspi to data derived from the UCLA Consortium for Neuropsychiatric Phenomics LA5c Study, which is an open-access data source. We will focus on blood-oxygen level-dependent signalling (BOLD) time-series obtained from functional magnetic resonance imaging (fMRI).
The data used in this tutorial is four sets of four brain regions recorded from an anonymous participant. All fMRI time-series in this dataset are comprised of 152 time points each (i.e., T = 152).
The following cells are intended to be run sequentially, from top to bottom.
1. Preparing the environment
We will first load all the modules needed for this demonstration. You may need to pip install
one or more packages first.
2. Preparing the fMRI multivariate time-series (MTS) data
Now that we have imported the necessary libraries, let's read in the four sets of fMRI data, each corresponding to a separate region of interest (ROI), into pandas DataFrames:
Now, let's inspect the first set (set1) to see how the data is set up:
You should obtain the following dataframe. As per the output, the dataframe is 4 rows by 152 columns. Each row corresponds to one of the four brain regions, and each column corresponds to a measurement at a single time point:
This is the format that pyspi expects: processes (e.g., brain regions) as the rows and time-points as the columns. If you have different input data that is transposed, you can easily transpose the pandas DataFrame using the transpose()
function.
To continue our exploratory data analysis, we can view the raw time series values for the first set of regions:
Alternatively, we can visualise the time-series for this first set of brain regions as a heatmap:
You should obtain a heatmap like the following:
You can also use the same two plotting functions -- plot_data_lines()
and plot_data_heatmap()
to visualise the raw time-series data for the other three sets of regions.
3. Run pyspi on a single dataset
Our data is now downloaded and ready to process with pyspi. There are two main steps to run pyspi and extract all of our pairwise statistics:
Initialise the
Calculator
object; andCompute the
Calculator
object.
In the first step, you have some flexibility to play around with which SPIs you want to compute from the input time-series data. By default, pyspi will compute all available SPIs. However, this can take a long time depending on how many observations you have. If you want to give pyspi a try with a reduced feature set that runs quickly, you can pass subset='fast'
as an additional parameter when instantiating the Calculator:
OPTIONAL: If you want to use the pre-computed calc_set1 object, you can simply download the file: pyspi_calc_set1.pkl and save it in the same location as your jupyter notebook.
To load the .pkl file, do the following:
We can then inspect the resulting statistical pairwise interactions (SPIs) using the calc_set1.table
object:
This output contains the resulting values from all of the SPIs concatenated together. We can view the list of SPIs that we calculated using the .keys()
method:
Using their corresponding key/identifier, we can isolate one of the SPIs and visualise the results across the brain regions. For example, let's look at the matrix of pairwise interactions (MPI) for the covariance (identifier cov_EmpiricalCovariance
) :
You should obtain the following plot of the MPI:
4. Running pyspi for multiple datasets
In the previous section, we ran through the pyspi pipeline for one dataset. In practice, you may have several datasets that you wish to process with pyspi, so here we cover tips for iterative processing.
We begin by creating a dictionary containing our other three datasets to process:
This combines the SPI tables from each set into a single dictionary. This step takes about 2 minutes to run, so if you want to skip this part, you can load the pre-computed results dictionary as follows:
We can then concatenate the dictionary of dataframes into one large dataframe as follows:
5. Downstream Analysis
Exporting data to R
For users who use R for data wrangling and visualisation, we can save the pyspi output to a pickle file (.pkl
) and write a custom function to load this data into R with the reticulate()
package. We can practice with the calc_set1
results, by first saving calc_set1.table
to its own .pkl
file:
We can then define a separate python script containing a function to extract the SPI data from our pyspi_calc_table.pkl
file -- found in this repo as pickle_reader_for_R.py
. Here are the contents:
Last updated