The present invention relates generally to machine diagnostics, and more specifically to a system and method that improves diagnostic accuracy by introducing time-related features to be used for the evaluation of diagnostic significance and the identification of high probability repairs that will resolve a machine fault.
A machine, such as a locomotive or other complex system used in industrial processes, medical imaging, telecommunications, aerospace applications, and power generation may include controls and sensors for monitoring the various systems and subsystems of the machine and generating a fault indication when an anomalous operating condition occurs. Because the malfunction can impair the ability of the owner to conduct business efficiently and cost effectively, it is essential to diagnose and repair the machine accurately and quickly.
Such complex machines usually generate an error log, containing information describing the sequence of events that occurred during both routine operation and during any malfunction situation. The field engineer called to diagnose and repair the machine, will first consult the error log to begin the diagnosis. The error log presents a xe2x80x9csignaturexe2x80x9d of the machine""s operation and can be used to identify and correlate malfunctions. Using her accumulated experiences at solving machine malfunctions, the field engineer reviews the error log to find symptoms that point to a specific fault and then repairs the machine to correct the problem. If the diagnosis was accurate, the repair will correct the machine malfunction. When the error log contains only a small amount of information, this manual process will work fairly well. However, if the error log is voluminous and certain entries have an uncertain relationship or perhaps no relationship to the malfunction, as is usually the case for large complex machines, it will be very difficult for the field engineer to properly review and comprehend the information and successfully diagnose the fault.
To overcome the problems associated with evaluating large amounts of data in error logs, computer-based diagnostic expert systems have been developed and put to use. These diagnostic expert systems are developed by interviewing field engineers to determine how they proceed to diagnose and fix a machine malfunction. The interview results are then translated into rules and procedures that are stored in a repository, which forms either a rule base or a knowledge base. The rule or knowledge base works in conjunction with a rule interpreter or a knowledge processor to form the diagnostic expert system. In operation, based on information input by the technician, the rule interpreter or knowledge processor can quickly find needed information in the rule or knowledge base to evaluate the operation of the malfunctioning machine and provide guidance to the field engineer. One disadvantage associated with such conventional diagnostic expert systems is the limited scope of the rules or knowledge stored in the repository. The process of knowledge extraction from experts is time consuming, error prone and expensive. Finally, the rules are brittle and cannot be updated easily. To update the diagnostic expert system, the field engineers have to be frequently interviewed so that the rules and knowledge base can be reformulated.
Another class of diagnostic systems use artificial neural networks to correlate data to diagnose machine faults. An artificial neural network typically includes a number of input terminals, a layer of output nodes, and one or more xe2x80x9chiddenxe2x80x9d layer of nodes between the input and output nodes. Each node in each layer is connected to one or more nodes in the preceding and the following layer. The connections are via adjustable-weight links analogous to variable-coupling strength neurons. Before being placed in operation, the artificial neural network must be trained by iteratively adjusting the connection weights and offsets, using pairs of known input and output data, until the errors between the actual and known outputs, based on a consistent set of inputs, are acceptably small. A problem with using an artificial neural network for diagnosing machine malfunctions, is that the neural network does not produce explicit fault correlations that can be verified by experts and adjusted if desired. In addition, the conventional steps of training an artificial neural network do not provide a measure of its effectiveness so that more data can be added if necessary. Also, the effectiveness of the neural network is limited and does not work well for a large number of variables.
Case-based reasoning diagnostic expert systems can also be used to diagnose faults associated with malfunctioning machines. Case-based diagnostic systems use a collection of data, known as historical cases, and compare it to a new set of data, a new case, to diagnose faults. In this context, a case refers to a problem/solution pair that represents the diagnosis of a problem and the identification of an appropriate repair (i.e., solution). Case-based reasoning (CBR) is based on the observation that experiential knowledge (i.e., memory of past experiences) can be applied to solving current problems or determining the cause of current faults. The case-based reasoning process relies relatively little on pre-processing of raw input information or knowledge, but focuses instead on indexing, retrieving, reusing, comparing and archiving cases. Case-based reasoning assumes that each case is described by a fixed, known number of descriptive attributes and use a corpus of fully valid cases against which new incoming cases can be matched for the determination of the fault root cause and the identification of the repair that has the highest probability of resolving the fault, based on the historical cases.
Commonly assigned U.S. Pat. No. 5,463,768 discloses an approach to fault identification using error log data from one or more malfunctioning machines and a CBR tool. Each of the historical error logs contains data representative of events occurring within the malfunctioning machines. In particular, a plurality of historical error logs are grouped into case sets of common malfunctions. From the group of case sets, common patterns, i.e., identical consecutive rows or strings of error data (referred to as blocks) are used for comparison with new error log data. In this comparison process, sections of data in the new error log that are common to sections of data in each of the case sets (the historical error logs) are identified. A predicting process then predicts which of the common sections of data in the historical error logs and the new error log are indicative of a particular malfunction. Unfortunately, for a continuous fault code stream, any or all possible fault codes may occur from zero times to an infinite number of times, and the fault codes may occur in any order, so that a pre-defined structure and order for the error log data is nearly impossible. This feature of comparing error logs based on the sequence in which certain events occur represents a limitation on the process for determining the malfunction using historical error log data.
U.S. Issued Pat. No. 6,415,395 entitled xe2x80x9cMethod and System for Processing Repair Data and Fault Log Data to Facilitate Diagnosticsxe2x80x9d, assigned to the same assignee of the present invention and herein incorporated by reference, discloses a system and method for processing historical repair data and historical fault log data, where this data is not analyzed based on sequential occurrences of faults, as in the commonly-owned patent described above. Instead, this system includes means for generating a plurality of cases from the repair data and the fault log data. Each case comprises a single repair and a plurality of related, but distinct faults. The faults in each case are grouped into a plurality of clusters, wherein the number of clusters is equal to the number of unique combinations of faults in the case. A weight value is assigned to each fault cluster, where the weight value indicates the likelihood that the repair will resolve the faults within that fault cluster. The weight is determined by dividing the number of times the fault combination (fault cluster) occurs in cases comprising related repairs by the number of times the fault combination occurs in all cases. To analyze a new fault, the new fault log data is entered into the system and compared with the plurality of fault log clusters. The repair associated with a matching fault log cluster represents a candidate repair to resolve the problem associated with the new fault log data. The candidate repairs are listed in descending order according to the calculated weight values.
Further, U.S. Issued Pat. No. 6,343,236, entitled xe2x80x9cMethod and System for Analyzing Fault Log Data for Diagnosticsxe2x80x9d, assigned to the same assignee of the present invention and herein incorporated by reference, discloses a system and method for analyzing new fault log data from a malfunctioning machine, by comparison with historical fault logs, but again, where the system and method are not restricted to sequential occurrences of faults. The fault log data is clustered based on related faults and then compared with historical fault clusters. Each historical fault cluster has associated with it a repair wherein the correlation between the fault cluster and the repair is indicated by a repair weight. Upon locating a match between the current fault clusters and one or more of the historical fault clusters, a repair action is identified for the current fault cluster based on the repair associated with the matching historical fault cluster.
This invention describes a method for improved fault isolation and resolution using fault logs from the failed machine together with historical repair information correlated with specific historical faults. The commonly assigned patent applications referred to above disclose a process providing reactive problem isolation occurring in machines. It is known that the presence of certain faults or anomalous conditions do not necessarily indicate the need for an actual repair in a machine. There is a complex implicit relationship between patterns of faults and the actual machine problem that necessitates a repair action. In these previously filed patent applications, combinations of fault patterns are utilized for mining the fault data in an effort to predict the most likely repair action. An important enhancement is provided by the present invention wherein the time-related behavior of a fault or a combination of faults is also used as a descriptive feature to improve the process of isolating a specific problem and generating the appropriate repair recommendation to resolve the fault.
The major components of the present invention involve first calculating a time window in which a fault occurs and then recording the frequency of fault occurrences over that time window. The method further includes a means for adaptively determining a nominal threshold for each fault during the time window and for determining a nominal threshold related to the frequency behavior of each fault over the time window. In particular, there are two thresholds or averages calculated in accordance with the teachings of the present invention. The first threshold is based on the number of days the specific fault occurs within a time window. Exemplary time windows include one month and two weeks. The second threshold or average is based on the number of occurrences for the specified fault in a given day, excluding those days in which the fault does not occur. Once the nominal thresholds are calculated, the present invention determines if a fault""s behavior over the time window is anomalous and therefore does not require immediate attention. That is, do the number of occurrences of the fault over the time window exceed either of the nominal thresholds for that fault. When either or both of the diagnostic thresholds are exceeded (i.e., the number of occurrences over the time window and the number of occurrences in a particular day during the time window) then the fault behavior is diagnostically significant. Finally, it is necessary to merge the diagnostic results derived from the present invention with those rates identified using the techniques described in the commonly assigned patent applications discussed above.