Python
A detailed usage guide for installing and using catch22 in Python.
Last updated
A detailed usage guide for installing and using catch22 in Python.
Last updated
Select a card below to access Python-specific usage information.
Python users can install pycatch22 via pip using the following command:
Additionally, pycatch22 can also be installed using the following alternative installation options. Click the expandable drop-down tabs to explore these options:
Here we outline how you can jump straight into computing catch22 features on your time series data with a basic usage example.
In pycatch22, time series can be provided as arrays in the form of tuples or lists , or as numpy
arrays:
All features are bundled in the method catch22_all
which can simply be called on your time series data as shown below:
The results of your calculation will be stored in a dictionary with the keys: names
and values
:
names
: the feature names, as listed in the feature table, returned as a list of strings.
values
: the feature outputs returned as a list of float64's.
Note that the ordering of the feature names is the same as that in which the outputs are given i.e., the first feature in names
has the output corresponding to the first (zeroth) index in values.
Continuing our example from above, we can view the features and their computed values with the following:
Which gives the following example output:
And that's it! You now have all the knowledge you need to apply catch22 to your time series data. To access some of the additional functionality of catch22, keep reading for advanced usage tips and examples.
Ready to explore some of the additional functionality of pycatch22?
If the location and spread of the raw time series distribution may be important for your application, you can enable catch24.
Catch24 is an extension of the original catch22 feature set to include mean and standard deviation, yielding a total of 24 time series features. To access catch24, you can pass an additional parameter to the catch22_all
method, telling it to compute the catch24 feature set as follows:
If you do not wish to compute the full catch22 feature set on your time series, you can alternatively compute each feature (including mean and std. deviation) individually.
In pycatch22 each feature function can be accessed individually and takes arrays as tuples or lists (not numpy
arrays) as input. To call a feature function, make sure to use its corresponding long name (as in the feature overview table), e.g., using CO_trev_1_num
for computing trev
.
As example, let's compute the feature trev
for a single univariate time series:
For each feature, we also include a unique 'short name' for easier reference (as outlined in the feature overview table).
In pycatch22 short names can be included in the output when calling catch22_all()
by setting short_names = True:
The returned dictionary, results
, now has three keys: names
, short_names
and values.
As before, names
is a list of the original long feature names (as strings), short_names
is a list of short names (as strings) and values
is a list of feature values.
Continuing from the basic usage example, let's try printing the feature outputs along with their corresponding short names:
Here is the output from our example. A lot more readable!
When working with large time series datasets (containing many univariate time series instances), computing features can often become a significant computational bottleneck. This is where multi-threading comes into play, offering a powerful solution to accelerate feature computation through parallelisation.
Let's start by importing all of the libraries we will need:
and then create a large time series dataset consisting of 50 000
univariate time series instances, each of length 1000:
Next, we will need to define an iterable function which takes in a single time series, applies catch22, and then returns the feature values:
We need to see how many cpu cores are available for multithreading. You can check by running os.cpu_count()
. The value returned (num of cpu cores available) will differ depending on your hardware and setup:
Now we can parallelise our function using the following:
You should notice that the time to compute all catch22 features on the entire dataset of 50 000 time series is much faster than sequentially looping over each!
For example, on 8 cores (Mac M1), it takes ~27 seconds to compute features on the entire dataset. Without multi-threading this would usually take ~2.5 minutes. That's almost a 5.5x speedup!
Click one of the expandable tabs below to explore commonly asked questions about pycatch22: