The present invention relates generally to machine diagnostics, and more specifically to a system and method that improves diagnostic accuracy for failure conditions that are not possible to adequately diagnose and are therefore referred to as “no trouble found” conditions.
A machine, such as a locomotive or other complex system used in industrial processes, medical imaging, telecommunications, aerospace applications, and power generation may include controls and sensors for monitoring the various systems and subsystems of the machine and generating a fault indication when an anomalous operating condition occurs. Because the malfunction can impair the ability of the owner to conduct business efficiently and cost effectively, it is essential to accurately diagnose and quickly repair the machine.
Such complex machines may generate an error log, containing information describing the sequence of events that occurred during both routine operation and during any malfunction situation. The field engineer called to diagnose and repair the machine, will first consult the error log to assist with the diagnosis. The error log presents a “signature” of the machine's operation and can be used to identify and correlate specific malfunctions. Using her accumulated experiences at solving machine malfunctions, the field engineer reviews the error log to find symptoms that point to a specific fault and then repairs the machine to correct the problem. If the diagnosis was accurate, the repair will correct the machine malfunction. When the error log contains only a small amount of information, this manual process works fairly well. However, if the error log is voluminous (the usual case for large complex devices) and certain entries have an uncertain relationship or perhaps no relationship to a specific malfunction, it will be very difficult for the field engineer to accurately review and comprehend the information and successfully diagnose the fault.
To overcome the problems associated with evaluating large amounts of data in error logs, computer-based diagnostic expert systems have been developed and put to use. These diagnostic expert systems are developed by interviewing field engineers to determine how they proceed to diagnose and fix a machine malfunction. The interview results are then translated into rules and procedures that are stored in a repository, which forms either a rule base or a knowledge base. The rule or knowledge base works in conjunction with a rule interpreter or a knowledge processor to form the diagnostic expert system. In operation, based on information input by the technician, the rule interpreter or knowledge processor can quickly find needed information in the rule or knowledge base to evaluate the operation of the malfunctioning machine and provide guidance to the field engineer. One disadvantage associated with such conventional diagnostic expert systems is the limited scope of the rules or knowledge stored in the repository. The process of knowledge extraction from experts is time consuming, error prone and expensive. Finally, the rules are brittle and cannot be updated easily. To update the diagnostic expert system, the field engineers have to be frequently interviewed so that the rules and knowledge base can be reformulated.
Another class of diagnostic systems use artificial neural networks to correlate data to diagnose machine faults. An artificial neural network typically includes a number of input terminals, a layer of output nodes, and one or more “hidden” layer of nodes between the input and output nodes. Each node in each layer is connected to one or more nodes in the preceding and the following layer. The connections are via adjustable-weight links analogous to variable-coupling strength neurons. Before being placed in operation, the artificial neural network must be trained by iteratively adjusting the connection weights and offsets, using pairs of known input and output data, until the errors between the actual and known outputs, based on a consistent set of inputs, are acceptably small. A problem with using an artificial neural network for diagnosing machine malfunctions is that the neural network does not produce explicit fault correlations that can be verified by experts and adjusted if desired. In addition, the conventional steps of training an artificial neural network do not provide a measure of its effectiveness so that more data can be added if necessary. Also, the effectiveness of the neural network is limited and does not work well for a large number of variables.
Case-based reasoning diagnostic expert systems can also be used to diagnose faults associated with malfunctioning machines. Case-based diagnostic systems use a collection of data, known as historical cases, and compare it to a new set of data, a new case, to diagnose faults. In this context, a case refers to a problem/solution pair that represents the diagnosis of a problem and the identification of an appropriate repair (i.e., solution). Case-based reasoning (CBR) is based on the observation that experiential knowledge (i.e., knowledge of past experiences) can be applied to solving current problems or determining the cause of current faults. The case-based reasoning process relies relatively little on pre-processing of raw input information or knowledge, but focuses instead on indexing, retrieving, reusing, comparing and archiving cases. Case-based reasoning approaches assume that each case is described by a fixed, known number of descriptive attributes and use a corpus of fully valid cases against which new incoming cases can be matched for the determination of a root cause fault and the generation of a repair recommendation.
Commonly assigned U.S. Pat. No. 5,463,768 discloses an approach to fault identification using error log data from one or more malfunctioning machines using CBR. Each of the historical error logs contain data representative of events occurring within the malfunctioning machine. In particular, a plurality of historical error logs are grouped into case sets of common malfunctions. From the group of case sets, common patterns, i.e., identical consecutive rows or strings of error data (referred to as a block) are used for comparison with new error log data. In this comparison process, sections of data in the new error log that are common to sections of data in each of the case sets (the historical error logs) are identified. A predicting process then predicts which of the common sections of data in the historical error logs and the new error log are indicative of a particular malfunction. Unfortunately, for a continuous fault code stream, any or all possible faults may occur from zero times to an infinite number of times, and the faults may occur in any order, so the structure of the fault log data is not amenable to easy diagnosis. This feature of comparing error logs based on the sequence in which certain events occur represents a limitation on the process for determining the malfunction using historical error log data.
U.S. patent application Ser. No. 09/285,612 filed on Apr. 2, 1999 and entitled “Method and System for Processing Repair Data and Fault Log Data to Facilitate Diagnostics”, assigned to the assignee of the present invention and herein incorporated by reference, discloses a system and method for processing historical repair data and historical fault log data, where this data is not restricted to sequential occurrences of fault log entries, as in the commonly owned patent described above. This system includes means for generating a plurality of cases from the repair data and the fault log data. Each case comprises a repair and a plurality of related and distinct faults. For each case, at least one repair and distinct fault cluster combination is generated and then a weight is assigned thereto. This weight value indicates the likelihood that the repair will resolve any of the faults included within the fault cluster. The weight is assigned by dividing the number of times the combination occurs in cases comprising related repairs by the number of times the combination occurs in all cases. New fault log data is entered into the system and compared with the plurality of fault log clusters. The repair associated with the matching fault log cluster represents a candidate repair to resolve that fault. The candidate repairs are listed in sequential order according to the calculated weight values.
Further, U.S. patent application Ser. No. 09/285,611, entitled “Method and System for Analyzing Fault Log Data for Diagnostics”, assigned to the same assignee of the present invention and herein incorporated by reference, discloses a system and method for analyzing new fault log data from a malfunctioning machine, again where the system and method are not restricted to sequential occurrences of fault log entries. The fault log data is clustered based on related faults and then compared with historical fault clusters. Each historic fault cluster has associated with it a repair wherein the correlation between the fault cluster and the repair is indicated by a repair weight. Upon locating a match between the current fault clusters and one or more of the historical fault clusters, a repair action is identified for the current fault cluster based on the repair associated with the matching historical fault cluster.
One particular type of fault situation that can be advantageously analyzed by certain fault analysis and diagnostic tools involves so-called “no trouble found” faults. Failure conditions that are difficult to diagnose within a complex system may result in such a declaration of no trouble found. The system experiences intermittent failures and once it is taken out of service and the repair process initiated, there is no evidence of a fault or failure. Generally this is occasioned by the intermittent nature of the fault or because the complexity of the system obscures the fault condition to a repair technician whose skills may be deficient in some area relevant to the system. In some situations, repair personnel may be unable to recreate the fault at the maintenance center. In each of these situations, the repair technician declares that the system is failure free and ready for return to service. Later, the system may experience a repeat failure due to the same problem, requiring another attempt at diagnosis and repair.
In the operation of a railroad, if a fault condition occurs while a locomotive is in service, the operator may elect to stop the train and attempt a repair with assistance from service personnel contacted by phone. In those cases where the operator cannot repair the fault, he will continue on his route until he arrives at a site where the locomotive can be diagnosed and repaired. If the locomotive is incapable of further operation, it is removed from service and towed to a repair site. Typically, the fault can be identified and repaired and the locomotive returned to service. In the event that the repair technician is unable to properly diagnose the fault condition, e.g., the fault condition no longer exists at the time the repair technician conducts his analysis, then the fault will be declared a no trouble found event.
Railroad operations usually require that all significant anomalous conditions on the locomotive must be analyzed and then closed out by the repair technician, including no trouble found events. In those situations where the diagnosis identifies a specific faulty part and a repair is accomplished, certain railroad repair codes are used to designate the problem and close it, after which the locomotive is returned to service. Due to the complexity of a railroad locomotive and the occasional inability to identify a specific fault condition, many “faults” are simply closed as “no trouble found”. Further, and disadvantageously, the inability to identify the root cause of the locomotive problem may result in the problem status remaining in an open condition for an extended period of time. This is detrimental to efficient operation of the railroad, as the operator would like to identify, diagnose and close faults as early and as efficiently as possible.
A further complication to the diagnosis and repair problem may be due to the site where the diagnosis and repair is first attempted. There are at least three different sites where a locomotive can undergo repairs, including on a run-through track where certain simple processes can be executed, on a service track where the locomotive is isolated from the main line and more complex and lengthy repairs can be undertaken, and at a main shop where the locomotive can be disassembled to diagnose problems and conduct repairs. Because the most complex repairs are undertaken at the main shop, the skill set of the technicians there tends to be higher than the ability of those technicians who are stationed at a run-through site. As a result, certain locomotive faults are incapable of being detected and thoroughly analyzed, dependent upon the site where the analysis takes place, again leading to a proliferation of “no trouble found” situations.
It is believed that the fault and repair analysis tools disclosed in the patent applications described above provide substantial advantages and advancements in the art of the diagnostics of complex machines. It would be desirable, however, to provide a system and method to improve the evaluation and identification of faults in those cases where heretofore a “no trouble found” designation was assigned. As a result, the diagnostic accuracy is improved and the number of no trouble found events that occur in fielded systems is reduced. Ultimately, reduction in the number of no trouble found conditions represents a cost savings to the system user due to fewer repeat failures and lower trouble shooting costs.