Many analysis tools monitor the behavior of an environment by analyzing time-series representations of the environment's key metrics. Consider a manufacturing process that employs one or more sensors. An analysis tool can assess the behavior of the manufacturing process by analyzing data generated by its sensors over a span of time (thus defining time-series data). Generally, an analysis tool can flag a dramatic change in the time-series data (e.g., a spike or a dip) as a potential malfunction within the environment.
In detecting spikes and dips, it is common to establish a model that defines the expected operation of the environment. For instance, an analysis tool may use various regression techniques to define a model that follows the general course of the measured time-series data. The analysis tool can then flag suspected anomalies by comparing each data point of the time-series data with the model. Large deviations are indicative of anomalies.
However, there are various challenges in properly detecting anomalies using the above-described approach. For instance, an environment may produce time-series data that is naturally noisy. This may make it difficult to define a model which accurately tracks the general course of the time-series data. Such difficulties may result in failing to identify actual anomalies, or in inaccurately labeling normal behavior as anomalous.
In addition to accuracy, in many monitoring environments, it is desirable to quickly identify anomalies soon after they occur. This may require an analysis tool to process a very large amount of data in a small amount of time, which, in turn, places certain constraints on the complexity of the algorithms used by the analysis tool. For instance, an algorithm may produce desirable accuracy with few false positives, yet may be too computationally complex to operate in a real-time manner. It is also desirable that the analysis tool scale well to evolving conditions within the environment being monitored.
In view of these illustrative factors, there is a need for effective strategies for detecting anomalies in time-series data.