The present invention relates to monitoring systems for hardware and software. More specifically, the present invention relates to modeling a baseline behavior of a hardware or software component, which is applied to an observed operational metric representing the hardware or software component's interaction with other software, hardware or human entities, to determine abnormal operation of the hardware or software component.
Currently, systems exist that monitor an operational metric and generate alarms whenever a given operational metric approaches, reaches or exceeds a pre-determined threshold. This technique involves a monitoring system or agent taking a sample of the monitored metric during a sampling interval and comparing the sampled value for the monitored metric against a static threshold value that is set by a user, such as an administrator. Whenever the monitored metric approaches, reaches or exceeds the static threshold value, an alarm is generated. Setting static thresholds, however, relies on the assumption that the user is capable of accurately defining upper and lower thresholds that signify the existence of unacceptable operational metrics or other exceptional conditions. The ability of the user to accurately and properly define thresholds directly affects the accuracy of a monitoring system, e.g., by generating false alarms.
In situations where a monitored metric has no obvious absolute maximum or minimum value, a user is unable to define percentage-based thresholds. For example, the user cannot define a threshold such as “ninety percent of the transactions that a web server performs”. A threshold such as this fails to be meaningful due to the fact that there is no obvious or meaningful absolute maximum (or minimum) for the total number of transactions that a web server may perform. Furthermore, defining a single static threshold may be insufficient to accurately signify the existence of an unacceptable or otherwise exceptional situation. Situations exist in which a monitored operational metric is below a threshold, but still signifies the existence of an unacceptable or otherwise exceptional situation, e.g., because the value of the operational metric is unexpected at a given time of day, day of the week, etc. This unexpected value may, for example, signify a problem or misuse of the hardware or software component from which the operational metric was sampled.
Therefore, systems and methods are needed that generate indicia of the behavior of a given hardware or software component to identify expected operational behavior of the component, which may be used to monitor for behavior that deviates from the expected operation of the component.