Computer systems commonly use performance thresholds for monitoring and managing the performance of system components. Threshold violations are recorded and analyzed as possible indicators of system faults. Methods for setting and managing component performance thresholds (referred to herein as “component thresholds” for brevity) are known in the art. In some applications, it is desirable to correlate component thresholds with service-level performance objectives (SLOs) of the computer system. In some cases, machine learning or data mining techniques are used to model the relationship between component thresholds and SLOs. In other cases, neural networks are used to learn the relationships between measured input values. U.S. Patent Publication No. 2006/0276995, the disclosure which is incorporated herein by reference, presents some of these methods.
The prior methods are limited to automatically adjusting a single component threshold and sending alerts when the component threshold is violated. Based on a correlation between the component threshold and the SLO violation, these alerts are used to signal the prediction of a current or imminent system fault. These prior methods require setting the component threshold at a level so that when that component threshold is violated, the violation predicts accurately that a violation of an SLO would occur. An improvement to prior methods would allow for automatically adjusting a component threshold pair, one signifying the probability of SLO violation and the other signifying the probability of SLO compliance. Using the threshold pair combination allows for representing when the probability of a violation of an SLO is high, medium, or low. When the component is performing between the two levels of the component threshold pair, the probability of SLO violation is medium and a lower priority warning could be sent in place of the alert. The higher priority alert is sent when the upper limit of the component threshold pair is violated. Over time, the values of the thresholds may change: one of the component thresholds may be greater than the other component threshold at one time, and vice versa at another time.