As computer equipment has become increasingly complex, the difficulty in monitoring this equipment to keep it functioning properly has become formidable. For example, a server attached to a network should be monitored to ensure that the hardware and software do not malfunction, to ensure that adequate resources such as memory and disk space are available during peak use times, to protect the server from electronic vandalism such as hacking that arrives over the network, and so forth.
As shown in FIG. 1, a monitored device 100 such as a server may be watched over by a monitor 110. The monitor 110 may sense traffic received and transmitted over a network 120 as well as sense conditions both internal and peripheral to the monitored device 100 such as memory use and disk occupancy, CPU utilization, power supply state, cabinet temperature, and numerous other measures of health.
To accomplish these purposes, the monitor 110 typically includes various sensors 111. Here, the term “sensor” is not confined to simple hardware devices, nor is it necessary that the sensors reside literally within the monitor 110. Rather, the term is intended to encompass both software and hardware systems for sensing the state of parameters that have importance with regard to the proper operation of the monitored device 110. Thus, in correspondence with the examples just mentioned, the monitor 110 may include a sensor that is an intrusion detection system that works in conjunction with protective equipment 130 such as a firewall, another for sensing memory use, yet another for sensing disk occupancy, and so forth. Typically, each sensor determines the state of associated parameters, which state is then evaluated.
Evaluation may involve the use of persistence filters 112. In a simple case, a persistence filter may compare the state determined by the corresponding sensor with a preestablished threshold. If the state violates the threshold, the filter generates an event indicator such as an alert. For example, if the cabinet temperature exceeds ninety degrees Celsius, an alert may be generated. In other cases, the decision process may be more complex as to whether an event indicator should be generated, and if so, what the nature of the indicator should be. For example, a critical alert might be generated if the remaining disk space is determined to be less than 10 MB, a warning alert generated if the remaining disk space is between 10 MB and 25 MB, and an informational alert generated if more than 25 MB remains.
The resulting event indicators may be sent to an event console 140, which may be operated by a human operator 150, or which may be autonomic. The event console 140 or the human operator 150 determine appropriate actions to invoke in response to the event indicators.
Although the configuration of FIG. 1 is now widely used, it suffers from a significant disadvantage: the operating point of the persistence filters 112 must be set as a compromise that minimizes the generation of both false positives and false negatives. Here, a false positive occurs when an event indicator is generated in response to an unimportant state. Typically, the human operator 150 exercises independent judgment and may decline to invoke a response to a false positive. On the other hand, a false negative occurs when an event is not generated despite the existence of a critical state that requires attention.
Because false negatives are generally more damaging than false positives, there is a tendency to configure the persistence filters to err on the side of permissiveness, thus potentially subjecting the event console 140 and the operator 150 to a flood of false positives. At some point, however, false positives begin to inflict their own damage by disrupting the operation of the monitored device 100 with unneeded protective measures, or by desensitizing the operator 150 to the arrival of critical alerts, i.e., true positives. Thus there is a need to improve the performance of information technology resource monitors in a way that minimizes the generation of false positives and false negatives, while preserving the capability of the monitor to unfailingly generate true positives when conditions warrant.