Finance: Stock Price Time Series
1. Preparing the environment and data
First, we need to import some tools for downloading and minimally processing the data. You may need to install a few non-standard libraries first:
pip install yfinance
pip install scipy
pip install scikit-learnNow, we will download financial data from one of two sources, Yahoo finance (yahoo) and the St. Lois Federal Reserve, by defining a function download() and passing in the tickers corresponding to the FAANG stocks: Facebook/Meta, Amazon, Apple, Netflix and Google:
import datetime, warnings
import yfinance as yf
import pandas as pd
import numpy as np
from scipy.stats import zscore
from scipy.signal import detrend
def download(symbols, start, days):
end = start + datetime.timedelta(days=days)
startstr = start.strftime('%Y-%m-%d')
endstr = end.strftime('%Y-%m-%d')
print(f'Obtaining {symbols} data from {startstr} to {endstr}...')
close = yf.download(symbols, start=startstr, end=endstr, progress=True)['Close']
# Match data up with weekdays
weekdays = pd.date_range(start=startstr, end=endstr, freq='B')
close = close.reindex(weekdays)
# For any NaN's, propogate last valid value forward (and remove first value)
z = close.fillna(method='ffill').values.T[:,2:]
# Make sure to always detrend and normalise your data, otherwise most statistics will give spurious results.
return detrend(zscore(z,ddof=1,nan_policy='omit',axis=1))
# The FAANG tickers (Facebook/Meta, Amazon, Apple, Netflix, Google)
stocks = ['META','AMZN','AAPL','NFLX','GOOGL']
# We'll download 140 days of data (corresponding to ~100 observations from business days)
ndays = 140
# Set a recent(ish) starting date for the period
start_datetime = datetime.datetime.strptime('2014-01-01', '%Y-%m-%d') # Earliest date we will sample
print('Begin data download.')
z = download(stocks,start_datetime,ndays)
print(f'Done. Obtained MTS of size {z.shape}')2. Inspecting the MTS
Now that we've obtained our data, we can inspect it to make sure everything looks okay. Let's begin by visualising the MTS in a single heatmap:
You should obtain a plot similar to the following:

3. Applying pyspi
Next, we instantiate a Calculator object using our data and compute all pairwise interactions. Then, we can inspect the results table:
As an example, below we will examine how covariance, dynamic time warping, and Granger causality differs when computing the relationship between the FAANG stock-market data.
You should obtain four separate plots, each corresponding to one of the four methods. Here we show an example of the 5x5 matrix of pairwise interactions (MPI) for the convergent cross-mapping method (identifier ccm):
4. Classifying MTS using SPIs
Now that you know how to compute the hundreds of pairwise interactions from data, let's put them to the test!
This part of the tutorial illustrates how to use sklearn to classify between two types of time-series using a comprehensive representation of their pairwise interactions.
Here, we will try to delineate the stock-market data of earlier from foreign exchange-rate data. We will start by downloading several instances of the data type for training/testing our classifier.
Now, compute all pairwise interactions for all datasets (storing the results). For the purposes of demonstration, we will use the 'fast' subset to speed up calculations.
Note: This computation will take a few minutes to run. If you would like to proceed using a pre-computed results file, simply download the following .pkl file and save it in the same location as your Jupyter notebook:
Let's see if we can tell whether our data is Forex or stock-market data based on the covariance matrix. We will train a support vector machine (SVM) to perform the classification task.
You should obtain an average accuracy of around 98%.
Now that we have tried classifying data using a single SPI, let's try this out for every SPI and see how they all perform and plot the results:
You should obtain a histogram like the following:

We can also inspect the average accuracy for each SPI individually:
This should produce the following output with 'cov_GraphicalLassoCV' being the top performing SPI for distinguishing forex from stock market data:
We can also inspect the MPIs to see what the distinguishing feature for our top performing SPI was:
This should produce the following output (here we show a small snippet of the full output):
Last updated