Python
A detailed usage guide for installing and using catch22 in Python.
Last updated
A detailed usage guide for installing and using catch22 in Python.
Last updated
Select a card below to access Python-specific usage information.
Python users can install pycatch22 via pip using the following command:
pip install pycatch22
Additionally, pycatch22 can also be installed using the following alternative installation options. Click the expandable drop-down tabs to explore these options:
Here we outline how you can jump straight into computing catch22 features on your time series data with a basic usage example.
In pycatch22, time series can be provided as arrays in the form of tuples or lists , or as numpy
arrays:
import pycatch22
tsData_asList = [1, 2, 3, 4] # (or more interesting data!)
tsData_as_numpy = np.array([1, 2, 3, 4])
All features are bundled in the method catch22_all
which can simply be called on your time series data as shown below:
# for time series data supplied as a list...
results = pycatch22.catch22_all(tsData_asList)
# alternatively, for time series data as a numpy array...
results_numpy = pycatch22.catch22_all(tsData_as_numpy)
The results of your calculation will be stored in a dictionary with the keys: names
and values
:
names
: the feature names, as listed in the feature table, returned as a list of strings.
values
: the feature outputs returned as a list of float64's.
Note that the ordering of the feature names is the same as that in which the outputs are given i.e., the first feature in names
has the output corresponding to the first (zeroth) index in values.
Continuing our example from above, we can view the features and their computed values with the following:
for feature, output in zip(results['names'], results['values']):
print(f"{feature}: {output}")
Which gives the following example output:
DN_HistogramMode_5: -0.03625270501343825
DN_HistogramMode_10: 0.2711751186493062
CO_f1ecac: 0.6562407580313526
CO_FirstMin_ac: 2
...
FC_LocalSimple_mean3_stderr: 1.1654896899009854
And that's it! You now have all the knowledge you need to apply catch22 to your time series data. To access some of the additional functionality of catch22, keep reading for advanced usage tips and examples.
Ready to explore some of the additional functionality of pycatch22?
If the location and spread of the raw time series distribution may be important for your application, you can enable catch24.
Catch24 is an extension of the original catch22 feature set to include mean and standard deviation, yielding a total of 24 time series features. To access catch24, you can pass an additional parameter to the catch22_all
method, telling it to compute the catch24 feature set as follows:
import pycatch22
​
tsData = [1, 2, 3, 4] # your ts data
c24_results = pycatch22.catch22_all(tsData, catch24 = True)
If you do not wish to compute the full catch22 feature set on your time series, you can alternatively compute each feature (including mean and std. deviation) individually.
In pycatch22 each feature function can be accessed individually and takes arrays as tuples or lists (not numpy
arrays) as input. To call a feature function, make sure to use its corresponding long name (as in the feature overview table), e.g., using CO_trev_1_num
for computing trev
.
As example, let's compute the feature trev
for a single univariate time series:
import pycatch22
import numpy as np
​
x = np.linspace(0, 10*np.pi, 500)
y = 0.3 * np.sin(x + 0.2) + 0.1 * np.random.randn(500)
tsData = list(y) # your time series data as a list
​
trev_value = pycatch22.CO_trev_1_num(tsData)
print(f"trev feature output: {trev_value}")
For each feature, we also include a unique 'short name' for easier reference (as outlined in the feature overview table).
In pycatch22 short names can be included in the output when calling catch22_all()
by setting short_names = True:
import pycatch22
import numpy as np
​
data = np.random.randn(100)
data = list(data)
​
# return short names
results = pycatch22.catch22_all(data, short_names=True)
The returned dictionary, results
, now has three keys: names
, short_names
and values.
As before, names
is a list of the original long feature names (as strings), short_names
is a list of short names (as strings) and values
is a list of feature values.
Continuing from the basic usage example, let's try printing the feature outputs along with their corresponding short names:
for (short_name, val) in zip(results['short_names'], results['values']):
print(f"{short_name}: {val}")
Here is the output from our example. A lot more readable!
mode_5: -0.7500426173295759
mode_10: 1.2786397188378562
acf_timescale: 0.5693516006124166
...
forecast_error: 1.1774630418671619
When working with large time series datasets (containing many univariate time series instances), computing features can often become a significant computational bottleneck. This is where multi-threading comes into play, offering a powerful solution to accelerate feature computation through parallelisation.
Let's start by importing all of the libraries we will need:
import pycatch22 as catch22
import os
from joblib import Parallel, delayed
import numpy as np
and then create a large time series dataset consisting of 50 000
univariate time series instances, each of length 1000:
# generate list where each entry is a length 1000 time series
dataset = [np.random.randn(1000) for _ in range(50000)]
Next, we will need to define an iterable function which takes in a single time series, applies catch22, and then returns the feature values:
def compute_features(x):
res = catch22.catch22_all(x)
return res['values'] # just return the values
We need to see how many cpu cores are available for multithreading. You can check by running os.cpu_count()
. The value returned (num of cpu cores available) will differ depending on your hardware and setup:
print(f"Number of cores available: {os.cpu_count()}")
Now we can parallelise our function using the following:
threads_to_use = os.cpu_count()
results_list = Parallel(n_jobs=threads_to_use)(
delayed(compute_features)(data[i]) for i in range(len(data))
)
# print the results for the first time series
print(results_list[1]) # prints list of feature outputs for time series 1
You should notice that the time to compute all catch22 features on the entire dataset of 50 000 time series is much faster than sequentially looping over each!
For example, on 8 cores (Mac M1), it takes ~27 seconds to compute features on the entire dataset. Without multi-threading this would usually take ~2.5 minutes. That's almost a 5.5x speedup!
Click one of the expandable tabs below to explore commonly asked questions about pycatch22: