In the present state of the art, electronic units, for example circuit boards, circuit cards or hybrids, are becoming increasingly complex, with many electronic components being densely packed thereon. As a consequence, it has become increasingly difficult to locate a fault in a defective unit to permit repair or replacement of the faulty component. With today's high cost of labor, it often becomes more economical to discard a faulty unit rather than attempting to locate and repair a fault thereon. However, for expensive electronic units, such as those employed in the aerospace or defense industries, the value of an individual unit is sufficiently high such that it cannot be discarded if found to be defective. As such, techniques for locating faults in complex electronic units are necessary.
In order to be effective, a fault locating system (often referred to as a "troubleshooting system"), must accurately isolate faults to a particular component, or a small number of components at the Line Replaceable Unit (LRU) level. Moreover, faults must be located quickly so that faulty units may be repaired quickly. Finally, an effective fault isolation technique should require a minimum number of measurements (often referred to as nodes probed) on a defective unit because probing is difficult for tightly packed units and because probing may damage a component.
As a result of this long felt need, many sophisticated troubleshooting systems have been proposed and a few have even been commercialized. However, prior art troubleshooting systems have had many shortcomings. One type of troubleshooting system employs a large number of test patterns which are applied to the inputs of an electronic unit in order to determine the location of a faulty component. In response to each test pattern the outputs are compared with expected outputs, to thereby isolate or at least localize a fault. It will be recognized that such test pattern techniques are extremely time consuming because many test patterns must be applied in order to locate a fault. Moreover, a large set of test patterns must be generated and tested for effectiveness in locating possible faults. Custom test patterns are required for each different unit to be tested, thereby greatly increasing the cost of testing.
Another type of system employs "fault dictionaries" in which all possible failures in a unit are described. This "fault dictionary" may then be employed to troubleshoot a circuit. Unfortunately, the generation of a fault dictionary for complex units is extremely time consuming. Attempts have been made to modify the fault dictionary technique to reduce the complexity thereof. See for example U.S. Pat. No. 4,242,751 to Henckels, et al. entitled Automatic Fault-Probing Method And Apparatus For Checking Electrical Circuits And The Like, in which an automated troubleshooting system combines fault dictionaries and probe tracking approaches to produce a look ahead computer guided smart probe. A failure prediction is made and the probe is guided to the predicted point. If this point is in fact defective, the failure is tracked, based on the circuit schematic, to make a more accurate prediction. The combination of predicting bad points and tracking bad points is employed to cause the probe to finally isolate the fault.
Yet another troubleshooting approach employs probabilistic methods to provide an indication of which components are likely to fail. Thus, a probabalistic belief of failure is associated with each component. Probabilistic models as described above presume the existence of a data base of likely failures. However, when testing the initial group of failed units there is no sample set from which likely to fail components may be generated. Moreover, for high cost low volume units which may be employed in aerospace and defense applications, an insufficient number of units may be produced to obtain statistical data on likelihood of failure. Finally, probabilities of failure for components may change as manufacturing quality improvements invalidate statistical data. Accordingly, probabilistic methods have been found to be of limited success.
Recently, troubleshooting systems have been proposed which employ artificial intelligence techniques to model the behavior of a human troubleshooter. Artificial intelligence based troubleshooting systems often employ known test pattern and probabilistic models. See for example U.S. Pat. No. 4,766,595 to Gollomp entitled Fault Diagnostic System Incorporating Behavior Models which discloses a combination of artificial intelligence, shallow reasoning and deep reasoning techniques for failure analysis. Shallow reasoning is a symptom-based approach in which known test failures are used as a predictor of likely present failures. Deep reasoning is a schematic based approach based on circuit topology. According to the '595 patent, a statistical criticality table is calculated which orders the effects of critical components in terms of probability of occurrence. Unfortunately, as described above, such systems are not effective for low volume units or the first defective samples of high volume units, before statistical failure data is accumulated.
Artificial intelligence-based troubleshooting systems have also proposed using reasoning based upon the circuit schematic to locate a fault. These systems typically decompose the circuit schematic into higher level functional blocks, with the lowest level or primitive blocks being models for physical components such as transistors or resistors. Models for a few components have been proposed; however, these systems have not heretofore been applied to real world highly dense electronic units having many different components. Examples of artificial intelligence based troubleshooting systems will now be described.
In an article by Davis entitled Reasoning Based on Structure and Behavior, published by The Artificial Intelligence Laboratory, Massachusetts Institute of Technology, a system for troubleshooting digital circuits using "constraint suspension" to generate candidates for faults is described. In this system, a device is modeled as a network of interconnected constraints where each constraint models the behavior of one component. A malfunction is determined when the output predicted for a constraint does not match its actual measured output. A fault is isolated by determining a constraint whose retraction will leave the network in a consistent state. Probability of failure techniques are employed in which categories of failures are enumerated and layered according to most likely failures and paths of interaction. These paths of interaction are then selected for probing. Functional and structural decompositions of a structure may be also used, in an unspecified manner, as may be the concept of "adjacency" in which interaction of devices are described in terms of electrophysical or electromagnetic adjacency.
Since the Davis, et al. system uses probability of failure processes, it is not effective for low volume, high cost products. Moreover, the "constraint suspension" technique is not readily usable for high density components, since the numbers of constraints become increasingly large and difficult to model. Finally, while functional and structural decomposition is described, no decomposition techniques are shown for use in the real world systems.
Another artificial intelligence based troubleshooting system, jointly developed by IBM and Electonique Serge Dassault, is described in papers entitled DEDALE: An Expert System For Troubleshooting Analogue Circuits (Deves et al., 1987 International Test Conference, pages 586-594) and Troubleshooting: When Modelling Is The Trouble, (Dague et al., Engineering Problem Solving, pages 600-605). As described in these two publications, DEDALE includes a modelling approach based upon "order of magnitude reasoning", which includes definitions of "about equal", "negligible" and "same order or magnitude." Top down and horizontal decompositions are employed to decompose the circuit in an unspecified manner, while at the lowest level "order of magnitude" relationships are used. Each component may be described in terms of multiple correct models, and a conflict resolution technique is employed to determine the one correct model for a particular circuit.
The DEDALE system uses "order of magnitude operators" to represent significant or gross changes in the behavior of lowest level components. The order of magnitude operators are used to describe the behavior of a component at the boundary of two piece-wise linear regions where significant changes in behavior occur. Using the order of magnitude operators at the boundaries may result in errors because in most cases the boundary is only a few millivolts. Also, the technique of using order of magnitude operators cannot accurately model slight changes in behavior of components which may lead to a unit's failure.
Moreover, the "order of magnitude" modelling technique is not applicable to all low level components or to higher level functional blocks which are necessary for modelling a complete circuit at different abstraction levels. No techniques are described for accurately decomposing the unit, nor is a technique disclosed for searching the decomposed circuit to isolate a fault. No techniques are shown for intelligently beginning the search for a failure in a manner which efficiently leads to the failed components. Accordingly, while the DEDALE system may be helpful for very simple units, where in fact a troubleshooting system is not needed, it is not extendable to dense and complex units.
Finally, U.S. Pat. No. 4,709,366 to Scott et al. entitled Computer Assisted Fault Isolation In Circuit Board Testing discloses a computer driven troubleshooting system which causes a computer to generate a unique stimulus pattern tailored to a particular node which is being probed. Preliminary functional tests may be used to determine display clues as to where to probe, by storing a "clue list" corresponding to those circuits which have failed a functional test. For example, if the unit under test fails during a test of its random access memory (RAM), the clue display to the technician may recommend initial probing of particular component pins at the RAM portion of the circuit board. Unfortunately, the Scott et al. system requires the generation of multiple unique stimulus patterns. Moreover, while the fact that a portion of the circuit has failed a test is used to provide "clues" to a technician, there is no technique disclosed for beginning troubleshooting of circuits which have failed, where an area of failure has not already been identified.
In summary, the art appears to have recognized that artificial intelligence based approaches, coupled with component modelling and circuit decomposition, have the potential to provide a troubleshooting system which can effectively locate faults in low volume high cost units using minimal node probing. The art has yet to transform this recognition into a system which is effective for real world units.