1. Technical Field
This disclosure relates in general to processing error data, and more particularly to a method, apparatus and program storage device for providing control of statistical processing of error data over a multitude of sources using a dynamically modifiable DFT rule set.
2. Background of the Invention
As consumers become more dependent on computer systems to perform reliable tasks, tolerance for computer system errors decreases. Computer systems typically experience outages when soft failures occur. As hardware ages an increasing number of computer errors occur, and the likelihood of soft failure increases. Without safety mechanisms computer systems inevitably experience failure resulting in user dissatisfaction.
In order to avoid computer system failure, methods for predicting or diagnosing an impending system failure have been developed. For example, a specification-based diagnosis of system failure is a method for determining what the expected behavior of a system will be based on system design specifications under defined operating conditions. Tests based on expected system behavior are developed and used to diagnose system failure. The specification-based diagnosis approach, however, has limited abilities in isolating unanticipated faults and in developing tests for diagnosing unanticipated faults.
Another example of a mechanism for diagnosing system failure is the symptoms-based diagnosis. System fault conditions are identified symptomatically by reconstructing system failures using event or error logs to identify the circumstances where errors occurred and evaluating the circumstances surrounding the errors leading up to system failure. The symptoms-based diagnosis approach results in system failure indicators rather than tests like the specification-based diagnosis approach.
A particular example of a symptoms-based diagnosis technique is the dispersion frame technique (DFT) that was developed based on the observation that computer systems and other electronic devices experience an increasing error rate prior to catastrophic failure. The DFT technique uses rules to determine the relationship between error occurrences by examining their closeness in time and space. Extending DFT rules augments the functionality of a DFT engine and allows a tighter control of statistical processing of error data over a multitude of computer devices. The rules also allow significant increments in error rate occurring within a specified time frame to be viewed as a single error event. The single error event is only recognized if the increment exceeds a specified watermark defined by the rule. Methods using the DFT use rules that are static, however, and only provides a single dimension of statistical analysis.
It can be seen that there is a need for a method, apparatus and program storage device for providing and implementing a dynamically modifiable DFT rule set.