Processes that occur over time are frequently monitored for change by metrics that are sampled on a periodic basis. It is often of interest to compare the level of the process to a specified target value. In other cases, the range of the process may be of interest. Implicitly, the observer of the data is using the measurements to determine if the process is behaving as expected, or alternatively, if there has been some kind of change that would indicate the process is behaving abnormally. In the latter case, the detected change could be used to trigger an alert state and initiate an in-depth root cause analysis. If a root cause is identified, a valuable corrective action could be taken. The worst case is that no root cause is found and the process measurements begin to look normal again. While there is an inconvenience associated with occasional false positive alert states, it is usually far outweighed by the benefit that comes along with early identification of true positive alert states.
Algorithms that take sequences of observations as input and return alerts that indicate a change in the process based on unusual trends or patterns in recent data are called change-point detection algorithms. Application of change-point detection algorithms have proliferated into many fields beyond their initial use in manufacturing and engineering disciplines. Illustrative examples include medical applications, for example, the effect of a stimulus on neuron behavior, heart beat variability during sleep, and detection of disease outbreaks. Other applications include, for example, distinguishing between seismicity levels, detection of cellular fraud, detection of variance changes in stock prices, special problems in hydrology, and applications related to network security and network activity.
A familiar change-point algorithm is the classic cumulative sum (cusum) algorithm which accumulates deviations (relative to a specified target) of incoming measurements and issues alerts when the cumulative sum gets too large. Commonly, the process is normally distributed with a known mean and standard deviation. Classic cusum algorithms are generally designed to detect shifts away from a targeted mean.
There is a need to adapt the classic cusum change-point detection algorithm for complex applications in systems monitoring where various and numerous performance and reliability metrics are available to aid with early identification of realized or impending problems and failures. Specifically, the inventors have solved this need by overcoming three significant challenges: 1) the need for a nonparametric technique so that a wide variety of metrics (including discrete metrics) may be included in the monitoring process, 2) the need to handle time varying distributions for the metrics that reflect natural cycles for non-stationary data sets, and 3) the need to be computationally efficient with the massive amounts of data that are available for processing. The present disclosure provides a solution including a screening feature that fully automates the implementation of the algorithm and eliminates the need for manual oversight up until the point where identification of an anomalous event is necessary.