As communications technology has evolved, communications technology users have become increasingly reliant on the ability to communicate almost instantaneously with others all over the globe. With this technology seemingly available everywhere, users of network resources have come to perceive performance delays of as little as 2-3 seconds as unacceptable. Time delays in data transfers and dropped phone calls in mobile telephone systems irritate and alienate customers and thus, service providers try to pay close attention to performance problems and correct them as quickly as possible.
Operational Measurements (OM's) in the context of network performance are network parameters that are measured and used as Network Performance Indicators (NPI's). These measurements can include call success rates, call termination rates, Quality of Service (QOS) measurements, traffic and routing measurements, network outage statistics, and the like. These OM's are typically measured over a fixed period of time, referred to as “OM transfer periods”.
Early detection of network performance anomalies could help avoid network outage events. A slow and persistent degradation of NPIs can indicate an issue such as memory leak. Additionally, simultaneous large abrupt and sudden changes in, for example, the call success rates from multiple NPIs can indicate the onset of outage events (the outage can be partial, i.e. losing >10% of capacity, or total outage). Therefore, it would be desirable to utilize the NPI process to help avoid or reduce the outage downtime of the network and other problems such as memory leak by devising a way to automatically process the NPIs to detect the occurrence of slow and persistent NPI OM degradation, severe and sudden degradation in NPI OM, and potential outage events and raise an appropriate log or alarm to alert the operator of the observed performance anomaly so that they can be investigated and dealt with in a timely manner.
There are many relevant existing stochastic process control algorithms that are routinely used in various industries to monitor product quality such as Shewhart, EWMA, and Page's CUSUM control charts. However, these standard quality control algorithms only deal with detecting deviations of the monitored quality metric from a fixed (known or unknown) mean value that is constant over time. In the NPI performance anomaly detection problem, the mean value of success rates can fluctuate slowly over time in normal operation (e.g., due to the change in traffic level or services usage pattern during the day), and thus only a statistically significant large and abrupt degradation, or a slow but steady degradation, from the most recent average success rates would indicate a possible onset of a new outage. This time-varying statistical characteristic of the NPI prevents direct application of these traditional stochastic process control algorithms.