An administrator of a data processing environment will attempt to ensure that the environment provides substantially error-free services to users. To this end, an administrator may manually examine various performance logs generated by the data processing environment to determine whether the data contained therein reveals any type of performance anomaly. If such an anomaly is present, the administrator can then take corrective action to eliminate or reduce the effects of the anomaly.
Nevertheless, the analysis performed by a human administrator may have various shortcomings. A typical data processing environment (such as a data center) may include many server machines and other processing equipment. These machines may potentially generate a large quantity of performance data. An administrator may find the task of manually examining this large amount of performance data to be both tedious and error-prone. That is, an administrator may be deluged by the large amount of performance data, potentially preventing the administrator from detecting and timely acting on telltale signs of impending failure in the data processing environment.
Numerous tools exist to assist an administrator in diagnosing failures in various types of data processing environments. However, as appreciated by the present inventors, these tools may fail to adequately relieve the burden placed on the administrator. In one such instance, a tool may rely on one or more alarm thresholds to detect the occurrence of anomalies. Selecting alarm threshold levels is not an intuitive exercise, and thus, an administrator may have difficultly selecting appropriate thresholds. As a result, the administrator may select thresholds that are too low or two high, resulting in the under-reporting or the over-reporting of anomalies. The tools may allow the administrator to adjust the threshold levels on an ad hoc basis, but this iterative correction processing may be both tedious and error-prone.