The present invention relates generally to processing time series data, and more particularly to selecting and validating anomaly detection models for time series data.
A data analysis tool uses behavioral learning algorithms to build an anomaly detection model based on a set of time series data. Such a model can cover individual metrics from a set, or subset of the metrics. When new data arrives for these metrics, they are then evaluated using such a model, and when the new data does not fit the model, an alarm is generated. If multiple anomaly detection models cover data from an individual metric, then multiple symptoms of bad behavior can be observed with these metrics.
A data analysis tool can use time series data from many types of data domains, including a data layer such as an applications layer, an operations system layer, a middleware layer, or a physical network layer, such as a communication transport layer. Data in these domains can dramatically change without notice, for instance, due to a service impacting issue, a maintenance window, or a re-purposing of a piece of equipment.
It is important when applying a specific anomaly detection technique to verify that the training data that created the model fits the data. This is similar to a statistical distribution test, such as a bell curve test, that verifies whether an observed sample comes from the expected distribution.
In some domains, it is common for a data set to have anomalies, and this is what makes the identification of a poor statistical model difficult. The identification of a poor model must be able to distinguish between a model that does not describe the data well and one that contains anomalies or changes but still accurately describes the data.
One drawback of current methods is that they do not take into account the time-ordered sequence of the data. Another drawback is that they do not take into account that the model may not fit contiguously when a change occurs in an underlying system.
This means that more anomaly detection models than is necessary may be discarded, potentially leaving sections of the infrastructure unprotected by automated anomaly detection. Furthermore, models that could have been deployed are not, meaning that a reduced set of information of symptoms observed on a metric may be reported.