At its core, hctsa analysis involves computing a library of time-series analysis features (which we call operations) on a time-series dataset.
The basic sequence of a Matlab-based hctsa analysis is to: 1. Initialize a HCTSA.mat
file, which contains all of the information about the set of time series and operations in your analysis, as well as the results of applying all operations to all time series, using TS_Init
,
These operations can be computed on your time-series data using TS_Compute
. The results are structured in the local HCTSA.mat
file containing matrices (that store the results of the computations) and the tables (that store information about the time-series data and operations), as described here.
After the computation is complete, a range of processing, analysis, and plotting functions are provided to understand and interpret the results.
As a quick check of your operation library, you can compute the full default code library on a time-series data vector (a column vector of real numbers) as follows:
Suppose you have have a time-series dataset to analyze. You first generate a formatted INP_ts.mat
input file containing your time series data and associated name and keyword labels, as described here. You then initialize an hctsa calculation using the default library of features:
This generates a local file, HCTSA.mat
containing the associated metadata for your time series, as well as information about the full time-series feature library (Operations
) and the set of functions and code to call to evaluate them (MasterOperations
), as described here.
Next you want to evaluate the code on all of the time series in your dataset. For this you can simply run:
As described here, or, for larger datasets, using a script to regularly save back to the local file (cf. sample_runscript_matlab
).
Having run your calculations, you may then want to label your data using the keywords you provided in the case that you have labeled groups of time series:
and then normalize and filter the data using the default sigmoidal transformation:
A range of visualization scripts are then available to analyze the results, such as plotting the reordered data matrix:
To inspect a low-dimensional representation of the data:
Or to determine which features are best at classifying the labeled groups of time series in your dataset:
Each of these functions can be run with a range of input settings.
Some external code packages require compiled binary code to be used. Compilation of the mex code is handled by compile_mex
as part of the install
script, but the TISEAN package binaries need to be compiled separately in the command line.
Many of the operations (especially external code packages) rely on mex functions (pieces of code written in C or fortran), that need to be compiled to run natively on a given system architecture. To ensure that as many operations as possible run successfully on your data, you should compile these mex functions for your system. This requires working compilers (e.g., gcc, g++) to be installed on your system, which can be configured using mex -setup
(cf. doc mex
for more information).
Once mex is set up, the mex functions used in the time-series code repository can be compiled by navigating to the Toolboxes directory and then running compile_mex
.
Some operations rely on the , which Matlab accesses via the terminal using system
commands, so the TISEAN binaries cannot be installed from within Matlab, but instead must be installed from the command line. If you are running Linux or Mac, we will assume that you are familiar with the command line, while those running Windows will require an alternate method to install TISEAN, as explained below.
In the command line (not within Matlab), navigate to the Toolboxes/Tisean_3.0.1 directory of the repository, then run the following chain of commands:
This should install the TISEAN binaries in your ~/bin/ directory (you can instead install into a system-wide directory, /usr/bin, for example, by running ./configure –prefix=/usr
). Additional information about the TISEAN installation process is provided .
If installation was successful then you should be able to access the newly-compiled binaries from the commandline, e.g., typing the command which poincare
should return the path to the TISEAN function poincare
. Otherwise, you should check that the install directory is in your system path, e.g., by adding the following:
to your ~/.bash_profile (and running source ~/.bash_profile
to update).
The path where TISEAN is installed will also have to be in Matlab’s environment path, which is added by startup.m
, assuming that the binaries are stored in ~/bin. The startup.m
code also adds the DYLD_LIBRARY_PATH, which is also required for TISEAN to function properly.
If you choose to use a custom location for the TISEAN binaries, that is not in the default Matlab system path (getenv('PATH')
in Matlab), then you will have to add this path manually. You can test that Matlab can see the TISEAN binaries by typing, for example, the following into Matlab:
!which poincare
If Matlab’s system paths are set up correctly, this command should return the path to your compiled TISEAN binary, poincare
.
If you are running Matlab from Windows, you will need a mechanism for Matlab to call system
commands and find compiled TISEAN binaries. There are two options:
Sacrifice operations that rely on TISEAN. In total, TISEAN-based operations account for approximately 300 operations in the operation library. Although they provide important, well-tested implementations of nonlinear time-series analysis methods, it's not the end of the world if you decide it's too much trouble to install and are ok to miss out on these methods (see below on how to explicitly remove them from a computed library).
If you decide not to use functions from the TISEAN package, you should initialize your dataset with the TISEAN functions removed. You could do this by removing them from you INP_ops.txt
file when initializing your dataset, or you could remove them from your initialized hctsa dataset by filtering on the 'tisean'
keyword.
For example, to filter a local Matlab hctsa file (e.g., HCTSA.mat
), you can use the following: TS_LocalClearRemove('raw','ops',TS_GetIDs('tisean','raw','ops'),true);
, which will remove all operations with the 'tisean' keyword from the hctsa dataset in HCTSA.mat
.
[If you are using a mySQL database to store the results of your hctsa calculations, TISEAN functions can be removed from the database as follows: SQL_ClearRemove('ops',SQL_GetIDs('ops',0,'tisean',{}),true)
].
Install on your machine. Cygwin provides a Linux distribution-like environment on Windows. Use this environment to compile and install TISEAN (as per the instructions above for Linux or Mac), which will require it to have . Matlab will then also need to be launched from Cygwin, using the command: matlab &
. This instance of Matlab should then be able to call system
commands through cygwin, including the ability to access the TISEAN binaries.
The hctsa package can be used completely within Matlab, allowing users to analyse time-series datasets quickly and easily. Here we will focus on this Matlab-based use of the software, but note that, for larger datasets requiring distributed computing set-ups, or datasets that may grow with time, hctsa can also be linked to a mySQL database, as described in a dedicated chapter.
The simplest way to get the hctsa package up and running is to run the install
script, which adds the required paths to dependent time-series packages (toolboxes), and compiles mex binaries to work on your system architecture. Once this one-off installation step is complete, you're ready to go! (NB: to include additional functions from the TISEAN nonlinear time-series analysis package, you'll also need to compile TISEAN routines).
After installation, future use of the package can begin by opening Matlab, navigating to the hctsa package, and then loading the paths required by the hctsa package by running the startup
script.
The hctsa framework consists of three basic objects containing relevant metadata:
Master Operations specify pieces of code (Matlab functions) and their inputs to be computed. Taking in a single time series, master operations can generate a large number of outputs as a Matlab structure, each of which can be identified with a single operation (or 'feature').
Operations (or 'features') are a single number summarizing some measure of structure in a time series. In hctsa, each operation links to an output from a piece of evaluated code (a master operation).
Time series are univariate and uniformly sampled time-ordered measurements.
These three different objects are summarized below:
In the example above, a master operation specifies the code to run, CO_AutoCorr(x,1:5,'TimeDomain')
, which outputs the autocorrelation of the input time series (x
) at lags 1, 2, ..., 5. Each operation (or 'feature') is a single number that draws on this set of outputs, for example, the autocorrelation at lag 1, which is named AC_1
, for example.
In the hctsa framework, master operations, operations, and time series are stored as tables that contain all of their associated keywords and metadata (and actual time-series data in the case of time series).
For a given hctsa analysis, the user must specify a set of code to evaluate (master operations), their associated individual outputs to measure (operations), and a set of time series to evaluate the features on (time series).
We provide a default library of over 7700 operations (derived from approximately 1000 unique master operations). This can be customized, and additional pieces of code can also be added to the repository.
Having specified a set of master operations, operations, and time series, the results of computing these functions in the time series data are stored in three matrices:
TS_DataMat is an n x m data matrix containing the results of applying m operations to the n time series.
TS_Quality is an n x m matrix containing quality labels for each operation output (coding different outputs such as errors or NaNs). Quality labels are described in the section below.
TS_CalcTime is an n x m matrix containing calculation times for each operation output. Note that the calculation time stored is for the corresponding master operation.
Each HCTSA*.mat
file includes the tables described above: for TimeSeries (corresponding to the rows of the TS_ matrices), Operations (corresponding to columns of the TS_ matrices), and MasterOperations, corresponding to the code evaluated to compute the operations. In addition, the results are stored as above: TS_DataMat, TS_Quality, and TS_CalcTime.
Quality labels are used to indicate when operations take non-real values, or when fatal errors were encountered. Quality labels are stored in the Quality column of the Results table in the mySQL database, and in local Matlab files as the TS_Quality matrix.
When the quality label is nonzero, this indicates that a special-valued output occurred. In this case, the output value of the operation is set to zero, as a convention, and the quality label codes the special output value:
Master Operation
Operation
Time Series
Summary:
Code and inputs to execute
Single feature
Univariate data
Example:
CO_AutoCorr(x,1:5,'TimeDomain')
AC_1
[1.2, 33.7, -0.1, ...]
Quality label
Description
0
No problems with calculation. Output was a real number.
1
A fatal error was encountered.
2
Output of the code was NaN.
3
Output of the code was Inf.
4
Output of the code was -Inf
5
Output had a non-zero imaginary component
6
Output was empty (e.g., []
)
7
Field specified for this operation did not exist in the master operation output structure