# Symbolic

Features based on a discrete symbolization of real-valued time-series data

**Naming info**: This features matches the

*hctsa*feature called

`SB_MotifThree_quantile_hh`

`entropy_pairs`

is computed as the following:- 1.Converts each value in the time series into one of three symbols ('A', 'B', or 'C') using an equi-probable binning in which the lowest 3rd of values are assigned 'A', the middle 3rd 'B', and the highest 3rd of values are given 'C'.
- 2.It then analyzes the probabilities of all two-letter sequences ('AA', 'AB', 'BB', …) and outputs the entropy of this set of probabilities.

This feature is based on the

*hctsa*code`SB_MotifThree(x_z,'quantile')`

, returning the `hh`

output.Time series that have predictable two-letter sequences (i.e., some sequences are far more probable than others) will have low values, and time series that are less predictable (i.e., all two-letter sequences are approximately equi-probable) will have high values.

For example, this Ricker map is highly predictable, with letters (A=red, B=blue, C=green) shown below, converting to "ABCABCABCABC…", such that "AB", "BC", and "CA" become highly probable sequences. It has a very low value for this feature (1.1):

Here is another very predictable series, which has a very long string of C, then B, then A, such that "AA", "BB", and "CC" are highly probable sequences yielding low entropy (1.1):

Time series that have predictable patterns on longer than 2-length windows, or that requiring more granularity than a 3-letter symbolization to resolve, have high values, like this map:

**Naming info**: This feature matches the

*hctsa*feature called

`SB_TransitionMatrix_3ac_sumdiagcov`

.This feature first symbolizes the time series into a 3-letter equiprobable alphabet (by quantile). It then computes

$\tau$

as first zero-crossing of the autocorrelation function and computes the $\tau$

-step transition probabilities between the three states as a 3x3 transition matrix. This feature then returns the sum of column-wise variances of this matrix.It is a measure of the specificity of source states given a target state. A minimum value would be given to a noisy series where the transition probabilities are approximately uniform, and a maximum value for a highly ordered series with very specific state transition rules. The subtlety with this feature is that it computes the transitions on a timescale of

$\tau$

such that it measures such order on the timescale at which the linear autocorrelation has faded.*catch22*contains two features that capture the maximum length of time over which similar consecutive local patterns are observed:

`stretch_high`

measures the longest successive period of above-average values.- The feature called
`SB_BinaryStats_mean_longstretch1`

in*hctsa*(the`longstretch1`

output from running`SB_BinaryStats(x_z,'mean')`

in*hctsa*);

`stretch_decreasing`

measures the longest successive period of successive decreases.- The feature called
`SB_BinaryStats_diff_longstretch0`

in*hctsa*(the`longstretch0`

output from running`SB_BinaryStats(x_z,'diff')`

in*hctsa*).

`stretch_high`

computes the longest sequence of successive values in the time series that are greater than the mean. Algorithmically, this is achieved in two steps:- 1.Transform the time series into a binary sequence: time-series values that are greater than the mean are set to
`1`

and time-series values that are less than or equal to the mean are set to`0`

. - 2.Return the longest sequence of successive values that are
`1`

.

- Low values are given to time series that tend not to linger too much on either side of the mean, like this moving average process, which has a maximum duration of
`stretch_high = 8`

samples (red) above the mean (zero: dashed line):

- High values are given to time series that have at least one long duration of time spent above the mean, like this stochastic sine map (
`stretch_high = 107`

successive time points above the mean):

`stretch_decreasing`

is similar to the above, but it calculates the longest sequence of successive steps in the time series that *decrease*. Algorithmically, this is achieved in two steps:

- 1.Transform the time series into a binary sequence: each time-series value is converted to a
`1`

if it is higher than the previous time point, and`0`

if it is lower than the previous time point (starting from the second point in the time series, and thus yielding a sequence of length`N-1`

, where`N`

is the length of the original time series). - 2.Return the longest sequence of successive values that are
`0`

.

- Here is a time series of a time series from the complex butterfly map, with the longest period (of
`stretch_decreasing = 30`

) successive decreases highlighted in red.

Last modified 11d ago