Python

A detailed usage guide for installing and using catch22 in Python.

Python Usage Guide

Select a card below to access Python-specific usage information.

Installation

Getting started

Advanced usage

Frequently Asked Questions

API Reference

Installation

Python users can install pycatch22 via pip using the following command:

pip install pycatch22

Additionally, pycatch22 can also be installed using the following alternative installation options. Click the expandable drop-down tabs to explore these options:

Alternative Installation Option 1: Anaconda/Mamba

pycatch22 is also available as a package on anaconda which you can install via conda or mamba using:

conda install -c conda-forge pycatch22

mamba install -c conda-forge pycatch22

Alternative Installation Option 2: Manual Install

This should not be required, but if you find issues with the pip or conda install options, you can try installing using setuptools.

First, clone the pycatch22 repository to your local machine using:

git clone https://github.com/DynamicsAndNeuralSystems/pycatch22.git

Navigate to the location of the cloned repository and install manually using:

python3 setup.py build
python3 setup.py install

Getting Started: Basic Usage

Here we outline how you can jump straight into computing catch22 features on your time series data with a basic usage example.

Expected Input

In pycatch22, time series can be provided as arrays in the form of tuples or lists , or as numpy arrays:

import pycatch22
tsData_asList = [1, 2, 3, 4] # (or more interesting data!)
tsData_as_numpy = np.array([1, 2, 3, 4])

Computing catch22 features

All features are bundled in the method catch22_all which can simply be called on your time series data as shown below:

# for time series data supplied as a list...
results = pycatch22.catch22_all(tsData_asList)
# alternatively, for time series data as a numpy array...
results_numpy = pycatch22.catch22_all(tsData_as_numpy)

Expected output

The results of your calculation will be stored in a dictionary with the keys: names and values:

names: the feature names, as listed in the feature table, returned as a list of strings.
values: the feature outputs returned as a list of float64's.

Note that the ordering of the feature names is the same as that in which the outputs are given i.e., the first feature in names has the output corresponding to the first (zeroth) index in values.

Continuing our example from above, we can view the features and their computed values with the following:

for feature, output in zip(results['names'], results['values']):
    print(f"{feature}: {output}")

Which gives the following example output:

DN_HistogramMode_5: -0.03625270501343825
DN_HistogramMode_10: 0.2711751186493062
CO_f1ecac: 0.6562407580313526
CO_FirstMin_ac: 2
...
FC_LocalSimple_mean3_stderr: 1.1654896899009854

And that's it! You now have all the knowledge you need to apply catch22 to your time series data. To access some of the additional functionality of catch22, keep reading for advanced usage tips and examples.

Advanced Usage

Ready to explore some of the additional functionality of pycatch22?

1. Catch24

If the location and spread of the raw time series distribution may be important for your application, you can enable catch24.

Catch24 is an extension of the original catch22 feature set to include mean and standard deviation, yielding a total of 24 time series features. To access catch24, you can pass an additional parameter to the catch22_all method, telling it to compute the catch24 feature set as follows:

import pycatch22

tsData = [1, 2, 3, 4] # your ts data
c24_results = pycatch22.catch22_all(tsData, catch24 = True)

2. Computing Individual Features

If you do not wish to compute the full catch22 feature set on your time series, you can alternatively compute each feature (including mean and std. deviation) individually.

In pycatch22 each feature function can be accessed individually and takes arrays as tuples or lists (not numpyarrays) as input. To call a feature function, make sure to use its corresponding long name (as in the feature overview table), e.g., using CO_trev_1_num for computing trev.

As example, let's compute the feature trev for a single univariate time series:

import pycatch22
import numpy as np

x = np.linspace(0, 10*np.pi, 500)
y = 0.3 * np.sin(x + 0.2) + 0.1 * np.random.randn(500)
tsData = list(y) # your time series data as a list

trev_value = pycatch22.CO_trev_1_num(tsData)
print(f"trev feature output: {trev_value}")

3. Short Names

For each feature, we also include a unique 'short name' for easier reference (as outlined in the feature overview table).

In pycatch22 short names can be included in the output when calling catch22_all() by setting short_names = True:

import pycatch22
import numpy as np

data = np.random.randn(100)
data = list(data)

# return short names
results = pycatch22.catch22_all(data, short_names=True)

The returned dictionary, results, now has three keys: names, short_names and values. As before, names is a list of the original long feature names (as strings), short_names is a list of short names (as strings) and values is a list of feature values.

Continuing from the basic usage example, let's try printing the feature outputs along with their corresponding short names:

for (short_name, val) in zip(results['short_names'], results['values']):
    print(f"{short_name}: {val}")

Here is the output from our example. A lot more readable!

mode_5: -0.7500426173295759
mode_10: 1.2786397188378562
acf_timescale: 0.5693516006124166
...
forecast_error: 1.1774630418671619

4. Multithreading

When working with large time series datasets (containing many univariate time series instances), computing features can often become a significant computational bottleneck. This is where multi-threading comes into play, offering a powerful solution to accelerate feature computation through parallelisation.

Let's start by importing all of the libraries we will need:

import pycatch22 as catch22
import os
from joblib import Parallel, delayed
import numpy as np

and then create a large time series dataset consisting of 50 000 univariate time series instances, each of length 1000:

# generate list where each entry is a length 1000 time series
dataset = [np.random.randn(1000) for _ in range(50000)]

Next, we will need to define an iterable function which takes in a single time series, applies catch22, and then returns the feature values:

def compute_features(x):
    res = catch22.catch22_all(x)
    return res['values'] # just return the values

We need to see how many cpu cores are available for multithreading. You can check by running os.cpu_count(). The value returned (num of cpu cores available) will differ depending on your hardware and setup:

print(f"Number of cores available: {os.cpu_count()}")

Now we can parallelise our function using the following:

threads_to_use = os.cpu_count()
results_list = Parallel(n_jobs=threads_to_use)(
    delayed(compute_features)(data[i]) for i in range(len(data))
)

# print the results for the first time series
print(results_list[1]) # prints list of feature outputs for time series 1

You should notice that the time to compute all catch22 features on the entire dataset of 50 000 time series is much faster than sequentially looping over each!

For example, on 8 cores (Mac M1), it takes ~27 seconds to compute features on the entire dataset. Without multi-threading this would usually take ~2.5 minutes. That's almost a 5.5x speedup!

FAQ

Click one of the expandable tabs below to explore commonly asked questions about pycatch22:

How can I submit a bug report for pycatch22?

If you would like to submit a bug report, you can access our Bug Tracker here. For guidelines on submitting an ideal bug report, see our page on contributing to catch22.

Which Python versions are compatible with pycatch22?

We have conducted thorough unit testing across Python versions 3.8, 3.9, 3.10, and 3.11 to verify and guarantee compatibility. These tests have been carried out on various operating systems, including Windows, macOS, and Linux, to ensure consistent performance and reliability across different platforms.

How can I install pycatch22 on my system?

We have a detailed installation guide here, including instructions for installing using pip, as well as alternative installation options including conda and manual (local) install.

How can I handle large datasets with many univariate time series?

While pycatch22 only takes a single univariate time series as input, when working with large time series datasets (containing many univariate time series instances), computing features can often become a significant computational bottleneck. While a simple for loop may suffice, such tasks are well suited to multi-threading approaches.

See above in our advanced usage guide for how you can compute catch22 features on large datasets using multi-threading.

How does pycatch22 integrate with sktime for time series analysis?

sktime, a comprehensive Python library for time series machine learning, leverages catch22's capabilities within a broader ecosystem of time series analysis tools for classification. For users looking to incorporate catch22's feature extraction into their sktime pipelines, they can do so by calling the Catch22Classifier which implements a random forest classifier using catch22 features calculated on time series data.

Tutorials for using catch22 within the sktime framework are also available here.

PreviousPublications using catch22 NextMATLAB

Last updated 11 months ago