Highly comparative time-series analysis with hctsa
  • Information about hctsa
    • Introduction
    • Getting started
    • Publications using hctsa
    • UMAP Projections
    • Related Time-Series Resources
    • List of included code files
    • FAQ
  • Installing and using hctsa
    • General advice and common pitfalls
    • Installing and setting up
      • Structure of the hctsa framework
      • Overview of an hctsa analysis
      • Compiling binaries
    • Running hctsa computations
      • Input files
      • Performing calculations
      • Inspecting errors
      • Working with hctsa files
    • Analyzing and visualizing results
      • Assigning group labels to data
      • Filtering and normalizing
      • Clustering rows and columns
      • Visualizing the data matrix
      • Plotting the time series
      • Low dimensional representation
      • Finding nearest neighbors
      • Investigating specific operations
      • Exploring classification accuracy
      • Finding informative features
      • Interpreting features
      • Comparing to existing features
      • Working with short time series
    • Working with a mySQL database
      • Setting up the mySQL database
      • The database structure
      • Populating the database with time series and operations
      • Adding time series
      • Retrieving from the database
      • Computing operations and writing back to the database
      • Cycling through computations using runscripts
      • Clearing or removing data
      • Retrieving data from the database
      • Error handling and maintenance
Powered by GitBook
On this page
  • Plotting for labeled groups of time series
  • Simpler distributions

Was this helpful?

Export as PDF
  1. Installing and using hctsa
  2. Analyzing and visualizing results

Investigating specific operations

PreviousFinding nearest neighborsNextExploring classification accuracy

Last updated 6 years ago

Was this helpful?

Sometimes it's useful to be able to investigate the behavior of an individual operation (or feature) across a time-series dataset. What are the distribution of outputs, and what types of time series receive low values, and what receive high values?

These types of simple questions for specific features of interest can be investigated using the TS_FeatureSummary function. The function takes in an operation ID as its input (and can also take inputs specifying a custom data source, or custom annotation parameters), and produces a distribution of outputs from that operation across the dataset, with the ability to then annotate time series onto that plot.

For example, the following:

TS_FeatureSummary(100)

Produces the following plot (where 6 points on the distribution have been clicked on to annotate them with short time-series segments):

You can visually see that time series with more autocorrelated patterns through time receive higher values from this operation. Because groups have not been assigned to this dataset, the time series are colored at random.

Running TS_FeatureSummary in violin plot mode provides another representation of the same result:

annotateParams = struct('maxL',500);
TS_FeatureSummary(4310,'raw',true,annotateParams);

This plots the distribution of feature 4310 from HCTSA.mat as a violin plot, with ten 500-point time series subsegments annotated at different points through the distribution, shown to the right of the plot:

Plotting for labeled groups of time series

Simpler distributions

TS_SingleFeature provides a simpler way of seeing the class distributions without annotations, as either kernel-smoothed distributions, as in TS_FeatureSummary, or as violin plots. See below for example implementations:

opID = 500;
makeViolin = false;
TS_SingleFeature('raw',opID,makeViolin);

Shows the distributions with classification bar underneath (for where a linear classifier would classify different parts of the space as either noisy or periodic):

opID = 500;
makeViolin = true;
TS_SingleFeature('raw',opID,makeViolin);

Shows the distributions shown as a violin plot, with means annotated and classification bar to the left:

Note that the title, which gives an indication of the 10-fold cross-validated balanced accuracy of a linear classifier in the space is done on the basis of a single 10-fold split and is stochastic. Thus, as shown above, this can yield slightly different results when repeated. For a more rigorous analysis than this simple indication, the procedure should be repeated many more times to give a converged estimate of the balanced classification accuracy.

When time series groups have been labeled (using as: TS_LabelGroups({'seizure','eyesOpen','eyesClosed'},'raw');), TS_FeatureSummary will plot the distribution for each class separately, as well as an overall distribution. Annotated points can then be added to each class-specific distributions. In the example shown below, we can see that the 'noisy' class (red) has low values for this feature (CO_tc3_2_denom), whereas the 'periodic' class mostly has high values.

TS_LabelGroups