A service level agreement (SLA) records an understanding between a customer and a service provider regarding services to be provided, priorities, responsibilities, guarantees, warranties and other parameters of the understanding. Each object of service scope set forth in the SLA typically has a defined level of service. For example, the SLA may specify the levels of availability, serviceability, performance, operation and/or other attributes of the service. The level of service can be specified as a minimum or target level for the object, thereby allowing the customer to be informed regarding what to expect from the service while providing measurable values that show the actual level of performance. The actual level of performance is typically measured utilizing an SLA monitoring tool or “network sniffer.”
An SLA based system is exemplary of any data network system that exhibits fluctuant measurement behavior. That is, every SLA monitoring tool is confronted with the task of recording a series of measurements of a number of metrics associated with objects of the SLA and issuing alerts based upon detection of anomalies in the metrics measurements, i.e., measurements that show the service performance to be less than the level of service specified in the SLA for that object. This task is difficult to fulfill because SLA measurements tend to have spikes and to exhibit inconsistent local behavior. Simple threshold-based decisions typically provide numerous false positives and a fluctuant alerts pattern that inhibit accurate analysis of system performance.
Standard SLA monitoring tools use raw measurements to perform stand-alone statistical calculations and then deduce the state of a particular performance object of the system as a result of these calculations. However, these calculations do not consider the current state of the performance object as a parameter. This means that the same measurements will always result in the same object state, regardless of the object's state prior to these measurements, or to a previous state change. The threshold conditions can be non-trivial, for example, requiring several threshold violations within a dynamic time-fragment, to minimize the chances of false positives. Nevertheless, fluctuant measurements behavior results in either fluctuant alerts patterns or too many false negatives (no identification when an anomaly situation occurs).
Thus, there is a need for an SLA monitoring tool that eliminates false positive anomaly detection while handling periodic spikes and fluctuant measurements characteristics.