> For the complete documentation index, see [llms.txt](https://time-series-features.gitbook.io/hctsa-manual/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://time-series-features.gitbook.io/hctsa-manual/installing-and-using-hctsa/analyzing_visualizing/filtering_and_normalizing.md).

# Filtering and normalizing

The first step in analyzing a dataset involves processing the data matrix, which can be done using `TS_Normalize`. This involves filtering out operations or time series that produced many errors or special-valued outputs, and then normalizing of the output of all operations, which is typically done in-sample, according to an outlier-robust sigmoidal transform (although other normalizing transformations can be selected). Both of these tasks are performed using the function `TS_Normalize`. The `TS_Normalize` function writes the new, filtered, normalized matrix to a local file called `HCTSA_N.mat`. This contains normalized, and trimmed versions of the information in `HCTSA.mat`.

Example usage is as follows:

```
TS_Normalize('mixedSigmoid',[0.8,1.0]);
```

The first input controls the normalization method, in this case a scaled, outlier-robust sigmoidal transformation, specified with `'mixedSigmoid'`. The second input controls the filtering of time series and operations based on minimum thresholds for good values in the corresponding rows (corresponding to time series; filtered first) and columns (corresponding to operations; filtered second) of the data matrix.

In the example above, time series (rows of the data matrix) with more than 20% special values (specifying 0.8) are first filtered out, and then operations (columns of the data matrix) containing any special values (specifying 1.0) are removed. Columns with approximately constant values are also filtered out. After filtering the data matrix, the outlier-robust ‘scaledRobustSigmoid’ sigmoidal transformation is applied to all remaining operations (columns). The filtered, normalized matrix is saved to the file `HCTSA_N.mat`.

Details about what normalization is saved to the `HCTSA_N.mat` file as `normalizationInfo`, a structure that contains the normalization function, filtering options used, and the corresponding `TS_Normalize` code that can be used to re-run the normalization.

## Setting the normalizing transformation

It makes sense to weight each operation equally for the purposes of dimensionality reduction, and thus normalize all operations to the same range using a transformation like ‘scaledRobustSigmoid’, ‘scaledSigmoid’, or ‘mixedSigmoid’. For the case of calculating mutual information distances between operations, however, one would rather not distort the distributions and perform no normalization, using ‘raw’ or a linear transformation like ‘zscore’, for example. The full list of implemented normalization transformations are listed in the function `BF_NormalizeMatrix`.

Note that the 'scaledRobustSigmoid' transformation does not tolerate distributions with an interquartile range of zero, which will be filtered out; 'mixedSigmoid' will treat these distributions in terms of their standard deviation (rather than interquartile range).

## Setting the filtering parameters

Filtering parameters depend on the application. Some applications can allow the filtering thresholds to be relaxed. For example, setting `[0.7,0.9]`, removes time series with less than 70% good values, and then removes operations with less than 90% good values.  Some applications can tolerate some special-valued outputs from operations (like some clustering methods, where distances are simply calculated using those operations that are did not produce special-valued outputs for each pair of objects), but others cannot (like Principal Components Analysis); the filtering parameters should be specified accordingly.

Analysis can be performed on the data contained in `HCTSA_N.mat` in the knowledge that different settings for filtering and normalizing the results can be applied at any time by simply rerunning `TS_Normalize`, which will overwrite the existing `HCTSA_N.mat` with the results of the new normalization and filtration settings.

## Filtering features using `TS_FilterData`

It is often useful to check whether the feature-based classification results of a given analysis is driven by 'trivial' types of features that do not depend on the dynamical properties of the data, e.g., features sensitive to time-series length, location (e.g., mean), or spread (e.g., variance). Because these features are labeled as `'lengthdep'`, `'locdep'`, and `'spreaddep'`, you can easily filter these out to check the robustness of your analysis.

An example:

```
% Get the IDs of length-dependent features from the `HCTSA.mat` file:
[ID_lengthDep,ID_notlengthDep] = TS_GetIDs('lengthdep','raw','ops');

% Generate a new file without these features, called 'HCTSA_locFilt':
TS_FilterData('raw',[],ID_notlengthDep,'HCTSA_locFilt.mat');
```

You could use the same template to filter `'locdep'` or `'spreaddep'` features (or any other combination of keyword labels). You can then go ahead with analyzing the filtered HCTSA dataset as above, except using your new filename, `HCTSA_locFilt`.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://time-series-features.gitbook.io/hctsa-manual/installing-and-using-hctsa/analyzing_visualizing/filtering_and_normalizing.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
