This disclosure relates generally to the field of computer systems. More particularly, but not by way of limitation, it relates to a technique for improving performance monitoring systems.
In a large Information Technology (IT) environment where millions of metrics are tracked in order to monitor the health of the overall system, fault isolation can be a very time-consuming and labor-intensive effort. Some performance monitoring software, such as BMC ProactiveNet®, helps in this endeavor by using many components, one of the most significant of which are the abnormality events, which are the objects that denote when the monitored metrics go outside their normal ranges of behavior. (BMC ProactiveNet is a registered trademark of the BMC Software, Inc.) These abnormality events get generated using rules (or thresholds), which specify the normal range of behavior for monitored metrics. The rules utilize specific data patterns (or baselines or dynamic thresholds) to specify normal operating ranges for corresponding metrics and these rules need to be managed by people.
As the infrastructure enlarges, the threshold management task quickly becomes prohibitively more expensive and impractical, since it requires a person with expert domain knowledge to decide what type of dynamic thresholds to use in order for the thresholds to generate the most accurate abnormality events. Because the task is so overwhelming, the users typically avoid it completely and leave all settings as they were “out-of-the-box.”
Thus, it would be beneficial to provide a mechanism to automatically determine dynamic thresholds for the monitored metrics for accurate detection of abnormalities.