Software applications are monitored to determine their health and efficiency. During monitoring of complex software applications, a time series can be generated for monitored variables. The time series is a set of data for a given variable at different points in time. For example, a time series for temperature may include values of temperatures recorded every hour over a twenty-four hour period (65, 68, 70, 71, 72, 68, and so on). With respect to application monitoring, a time series may be analyzed to determine the health of the application being monitored.
A time series for an application may indicate that the application is not healthy if an anomaly occurs in the time series. An anomaly occurs if an actual value is different from an expected or predicted value. Thus, if a value of a data point in a temperature time series is expected to be eighty-five degrees, and the actual data point value is one hundred fifteen degrees, an anomaly has occurred with respect to the data point.
One challenge of analyzing a time series is to identify the difference between a false positive (a false identification of an anomaly) and an actual anomaly while still identifying all events of interest. Most systems for detecting an anomaly in a time series rely on simple rules and thresholds (for example, a single number or a percentage). These static rules and thresholds are sufficient for some applications, but can't detect some subtle changes in behavior. Further, applying static criteria is troublesome when identifying and specifying suitable thresholds for thousands of time series, such as in the case of application management.
Some monitoring systems use a statistical model to predict time series data points and identify anomalies. The success of statistical models is often erratic. Usually, these systems employ a balance between selectivity (avoiding false positives) and sensitivity (ability to detect true positives). It is difficult to analyze a time series to accurately detect an anomaly.