Exploring classification accuracy
Last updated
Last updated
When performing a time-series classification task, a basic first exploration of the data is to investigate how accurately a classifier can learn a mapping from time-series features to labels assigned to time series in your dataset.
The first step is to assign group labels to time series in your dataset using TS_LabelGroups
.
Depending on the classifier, you typically want to first normalize the features to put them all on a similar scale (using TS_Normalize
).
Depending on the question asked of the data, you should also consider whether certain types of features should be removed. For example, you may wish to exclude length-dependent features (if differences in time-series length vary between classes but are an uninteresting artefact of the measurement). This can be done using TS_Subset
(and functions like TS_CompareFeatureSets
described below allow you to test the sensitivity of these results).
TS_Classify
)TS_Classify
uses all of the features in a given hctsa data matrix to classify assigned class labels.
You can set classification settings, from the number of folds to use in cross-validation to the type of classifier, as the cfnParams
structure. For the labeling defined in a given TimeSeries
table, you can set defaults for this using cfnParams = GiveMeDefaultClassificationParams('norm')
(takes TimeSeries
labeling from HCTSA_N.mat
). This automatically sets an appropriate number of folds (for cross-validation), and includes settings for taking into account class imbalance in classifier training and evaluation. It is best to alter the values inside this function to suit your needs, such that these settings can be applied consistently.
First let's run a simple classification of the groups labeled in HCTSA_N.mat
, using default classification settings:
In large feature spaces like in hctsa, simpler classifiers (like 'svm_linear'
) tend to generalize well, but you can play with the settings in cfnParams
to get a sense for how the performance varies.
As well as the classification results, the function also produces a confusion matrix, which is especially useful for evaluating where classification errors are occurring. Here's an example for a five-class problem:
In datasets containing fewer time series, it is more likely to obtain high classification accuracies by chance. You may therefore wonder how confident you can be with your classification accuracy. For example if you get a two-class classification accuracy of 60%, you might wonder what the probability is of obtaining such an accuracy by chance?
You can set numNulls
in TS_Classify
to iterate over the classification settings defined in cfnParams
except using shuffled class labels. This builds up a null distribution from which you can estimate a p-value to infer the significance of the classification accuracy obtained with the true data labeling provided.
You can also choose to run across multiple cores by switching on doParallel
:
This gives you a p-value estimate (both via a direct permutation test, and by assuming a Gaussian null distribution), and plots the null distribution with the true result annotated:
TS_CompareFeatureSets
)You might wonder whether the classification results are driven by simple types of features that aren't related to time-series dynamics at all (such as the mean of the data, or time-series length).
These can be filtered out from the initial computation (e.g., when performing TS_Init
), or subsequently (e.g., using TS_Subset
), but you can test the effect such features are having on your dataset using TS_CompareFeatureSets
. Here's an example output:
Here we see that length-dependent features are contributing to accurate classification (above-50% accuracy for this two-class balanced problem). We can take some relief from the fact that excluding these features ('notLengthDependent'
) does not significantly alter the classification accuracy, so these features are not single-handedly driving the classification results. Nevertheless, assuming differences in recording length is not an interesting difference we want to bias our classification results, it would be advisable to remove these for peace of mind.
The complexity of the time-series analysis literature is necessary for strong classification results to different degrees, depending on the task. You can quickly assess how accurately a smaller number of reduced components (e.g., Principal Components) can better classify your dataset using TS_Classify_LowDim
:
The classification accuracy is shown for all features (green, dashed), and as a function of the number of leading PCs included in the classifier (black circles). Note that this is cumulative: '5 PCs' means classification in the five-dimensional space of the five leading PCs.
Here we find that we can get decent classification accuracy with just four PCs (and perhaps even more complex classifiers will give even better results in the lower-dimensional spaces).
You can quickly interpret the type of features loading strongly onto each PC from the information shown to screen. For example:
Demonstrates that, on this dataset, long-lag autocorrelations are the most strongly correlated features to PC5.