1. Field of the Invention
This invention is related in general to the field of data storage systems. In particular, the invention consists of a pattern analysis method used to dynamically detect errors and generate weighted numeric values.
2. Description of the Prior Art
Error logs are generated by systems such as mechanical systems, computer systems, and information systems in response to system faults or anomalous conditions. These systems often include an error logging and analysis component (“ELA”) to log the error, analyze the failure, and initiate mitigating action in real-time. Systems that experience repetitive errors may utilize analysis techniques to recognize error patterns.
Data storage systems such as computer hard disk drives, redundant arrays of independent/inexpensive disks (“RAIDs”), or structured random access memory (“RAM”) can benefit from error pattern analysis to determine the source of repetitive errors or to predict system failure. However, error pattern analysis traditionally has been difficult to implement in complex systems. Real-time pattern analysis has generally been limited by space (required to store error messages), processing resources, and the amount of time required to detect and analyze error patterns.
Newer ELA components utilize time-based methods to determine if a fault is statistically relevant. A common technique is to sum the number of fault events of a particular type over a time interval and compare this to a predetermined threshold. These time-based methods are relatively simple and effective in overcoming the problems of storage-space, processing resources, and time. However, time-based methods are not efficient when used in complex software/hardware systems because they do not effectively detect problems that develop over large periods of time. This can potentially result in an unexpected loss of a resource or catastrophic system failure. Additionally, time-based ELA systems have difficulty managing errors that occur in clusters, i.e., large numbers of errors over a small period of time interspersed with long error-free periods.
In U.S. Pat. No. 5,463,768, Paul Cuddihy et al. disclose an error log analysis system comprising a diagnostic unit and a training unit wherein the training unit includes a plurality of historical error logs. Sections of error logs that are in common with other historical error logs are identified and labeled as blocks. Each block is then weighted with a numerical value that is indicative of its value in diagnosing a fault. However, this system does not assign error weights to individual error instances. Additionally, proper implementation of this system requires that error analysis be order or time dependent.
In U.S. Pat. No. 6,625,589, Anil Varma et al. disclose an algorithm for improving the probability of identifying a repair that will correct a fault utilizing a historical fault log and calculating the number of times a fault occurs in a given period of time. Faults which occur with a frequency greater than the average are considered statistically significant. However, weights are not assigned to individual errors to assist the fault analysis process. Accordingly, it would be advantageous to have an error logging system that utilizes error severity and occurrence to generate a weighted error rate. Additionally, it would be beneficial to compare these weighted error rates to a predetermined threshold to assist in predicting component failure.