In many information processing systems, it is important to monitor the occurrence of various different types of events. For example, in an operating system associated with a communications network, events that are monitored may include dropped or blocked calls. Such systems typically incorporate an event counter which maintains a count of the number of events occurring for each of the different types of events to be monitored. Based on the event counts, alarms may be generated to indicate problems or other conditions within the system. The event monitoring process is also commonly referred to, in certain information processing system contexts, as anomaly detection or fault detection.
Numerous techniques for event monitoring are known in the art. Examples of such techniques are described in R. A. Maxion et al., “A Case Study of Ethernet Anomalies in a Distributed Computing Environment,” IEEE Transactions on Reliability, Vol. 39, No. 4, pp. 433–443, October 1990, F. Feather et al., “Fault Detection in an Ethernet Network Using Anomaly Signature Matching,” Proceeding of SIGCOMM '93, Ithaca, N.Y., pp. 279–288, September 1993, and M. Thottan et al., “Proactive Anomaly Detection Using Distributed Intelligent Agents,” IEEE Network, September/October 1998, pp. 21–27, all of which are incorporated by reference herein.
A significant problem with certain conventional event monitoring techniques is that such techniques are typically configured to “learn” patterns of counts over relatively long periods of time. This can lead to instability in situations in which the long-range pattern of the monitored events is itself changing on a timescale shorter than the learning time. Conventional techniques thus may not use event data from surrounding time periods to maximum advantage.
Also, conventional techniques often assume a normal distribution or other symmetric distribution for the event data being monitored, although such a distribution may not provide a sufficiently accurate estimate of the actual data distribution.
As a result of the above-described instability and data distribution assumptions, the conventional techniques can exhibit an excessive rate of false alarms.
Another problem with the conventional techniques is that such techniques generally do not provide a common scale with reference to which it is possible to combine and compare different types of observations having different baselines.
Yet another problem is that the conventional techniques are generally not configured to provide a sufficiently accurate indication of the severity of alarms.
It is therefore apparent that a need exists for improved event monitoring techniques which address one or more of the above-noted problems.