Highly comparative time-series analysis with hctsa
  • Information about hctsa
    • Introduction
    • Getting started
    • Publications using hctsa
    • UMAP Projections
    • Related Time-Series Resources
    • List of included code files
    • FAQ
  • Installing and using hctsa
    • General advice and common pitfalls
    • Installing and setting up
      • Structure of the hctsa framework
      • Overview of an hctsa analysis
      • Compiling binaries
    • Running hctsa computations
      • Input files
      • Performing calculations
      • Inspecting errors
      • Working with hctsa files
    • Analyzing and visualizing results
      • Assigning group labels to data
      • Filtering and normalizing
      • Clustering rows and columns
      • Visualizing the data matrix
      • Plotting the time series
      • Low dimensional representation
      • Finding nearest neighbors
      • Investigating specific operations
      • Exploring classification accuracy
      • Finding informative features
      • Interpreting features
      • Comparing to existing features
      • Working with short time series
    • Working with a mySQL database
      • Setting up the mySQL database
      • The database structure
      • Populating the database with time series and operations
      • Adding time series
      • Retrieving from the database
      • Computing operations and writing back to the database
      • Cycling through computations using runscripts
      • Clearing or removing data
      • Retrieving data from the database
      • Error handling and maintenance
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Installing and using hctsa
  2. Working with a mySQL database

Cycling through computations using runscripts

As described above, computation involves three main steps:

The procedure involves three main steps:

  1. Retrieve a set of time series and operations from (the Results table) of the database to a local Matlab file, HCTSA.mat (using SQL_Retrieve).

  2. Compute the operations on the retrieved time series in Matlab and store the results locally (using TS_Compute).

  3. Write the results back to the Results table of the database (using SQL_store).

It is usually the most efficient practice to retrieve a small number of time series at each iteration of the SQL_Retrieve–TS_Compute–SQL_store loop, and distribute this computation across multiple machines if possible. An example runscript is given in the code that accompanies this document, as sample_runscript_sql, which retrieves a single time series at a time, computes it, and then writes the results back to the database in a loop. This can be viewed as a template for runscripts that one may wish to use when performing time-series calculations across the database.

This workflow is well suited to distributed computing for large datasets, whereby each node can iterate over a small set of time series, with all the results being written back to a central location (the mySQL database).

By designating different sections of the database to cycle through, this procedure can also be used to (manually) distribute the computation across different machines. Retrieving a large section of the database at once can be problematic because it requires large disk reads and writes, uses a lot of memory, and if problems occur in the reading or writing to/from files, one may have to abandon a large number of existing computations.

PreviousComputing operations and writing back to the databaseNextClearing or removing data

Last updated 4 years ago

Was this helpful?