Over recent years, several methods for diagnosing faults in systems comprised of multiple components have been devised. Typically, such methods utilize information concerning the structure of the system, the types of tests executed on the system, and the results of those tests in order to identify the system component most likely responsible for any test failures encountered.
Such diagnostic methods often have been applied to electronic systems at varying levels of abstraction. For example, a defective computer among hundreds or thousands of computer systems constituting a computer network may be identified by such a method. Similarly, the same method may be employed to identify the source of faults within a single computer, printed circuit board, or integrated circuit. Furthermore, such methods may be utilized in systems residing outside the electronics domain, including, but not limited to, mechanical, aeronautical, and medical or biological systems.
A diagnostic system employing such a method has been described in U.S. Pat. No. 5,808,919 by Christopher Preist and David Allport, which is assigned to the assignee of the present invention. Generally speaking, such a system begins its diagnosis by generating sets of logically possible diagnoses. The system accomplishes this task by first identifying “conflict sets,” wherein each set identifies all of the components involved in a particular set of failing functional tests. Conceptually, each test, whether failing or passing, employs one or more “operations,” or functions, with each operation employing one or more physical “components” of the system. Each component is exercised by an operation at a particular level of “utilization,” or coverage, represented as a fraction between 0 and 1 inclusive, with 0 indicating no utilization, and 1 representing complete utilization of the component. Using these conflict sets, the system then generates a number of “hitting sets,” with each representing a set of components that, if defective, would result in the identified set of failing functional tests. Therefore, each hitting set represents a logically possible diagnosis, given the results of failing tests.
Once each of the possible hitting sets, or candidate diagnoses, is identified, each is assigned a relative weight by way of calculation of various probabilities of failure. The basis of the calculations is Bayes' Rule:p(D|T)=(p(D)*p(T|D))/p(T),
Using this rule, D=a candidate diagnosis and T=the complete set of failing and passing test results. p(D|T) is the probability of the components of the candidate diagnosis failing, given the set of test results. This value is also known as the “posterior probability” of the candidate diagnosis. Alternately, p(D) is the “prior probability” of the components of the candidate diagnosis failing when no information concerning test results is considered. Additionally, p(T|D) is the probability of a given set of test results, given a particular candidate diagnosis is correct. Finally, p(T) is the probability of a particular set of test results regardless of the correct candidate diagnosis.
Typically, the specific value of the posterior probability of the candidate diagnosis is not required; what is important is the relative ranking of the posterior probability for each candidate diagnosis so that the diagnosis most likely responsible for the observed test results is identified. Therefore, since p(T) is the same for each candidate diagnosis, that term may be canceled from p(D|T) for each candidate diagnosis, leaving a relative posterior probability:Relative p(D|T)=p(D)*p(T|D)
To calculate the first factor, p(D), an assumption is normally made that each of the components of the candidate diagnosis fail independently. As a result, the prior probability of the candidate diagnosis may be calculated by multiplying together the failure rate of each component of the candidate diagnosis:
      p    ⁡          (      D      )        =            ∏              ∀                                  ⁢                  components          ∈          D                                          ⁢                  ⁢          (              failure        ⁢                                  ⁢        rate        ⁢                                  ⁢                  (          component          )                    )      
(In the previous equation, as well as in ones to follow, “∀” symbolizes “for all,” and “∈” signifies “element(s) of.”)
The failure rate of each component employed is usually provided by the manufacturer of that particular component, based on empirical evidence over many thousands of components produced. Additionally, the failure rate may also include other sources of error, such as “process errors” (e.g., the improper placement or soldering of the component to a printed circuit board).
To calculate p(T|D) for a particular candidate diagnosis, the diagnosis system typically assumes that the results for each test are independent to simplify the calculations involved. In that case, the probabilities of each test result for a candidate diagnosis are simply multiplied. For example, if two tests, T1 and T2, are involved, the resulting p(T|D) would be the product of the probabilities involving each test:p(T|D)=p(T1|D)*p(T2|D)
To further reduce the complexity of the calculations involved, an additional assumption is normally made that the probability of a test failing, given the failure of a particular component of a candidate diagnosis, is proportional to the utilization of that component by that particular test. Likewise, the probability of a test passing, given the failure of a component, is assumed to be (1−the utilization of that component by the passing test). For example, if T1 is a failing test, T2 is a passing test, and the candidate diagnosis is a single component C1, the posterior priority of the overall test results, given a candidate diagnosis, may be stated as follows:p(T|D)=utilization of C1 in T1*(1−utilization of C1 in T2)
In some alternate applications, the utilization of the component of a failing test is assumed to be one, further simplifying the overall calculation at the possible expense of some inaccuracy in determining the relative posterior probability for each hitting set. Such a simplification is possible if the component involved is simple enough that any operation exercising it employs essentially the entirety of the component.
In the case where a candidate diagnosis consists of multiple components, the utilization of each of the components by each of the failing tests, and one minus the utilization of the each component by each passing test, may be factored together to generate p(T|D).
Once p(D) and p(T|D) are determined for a candidate diagnosis (i.e., hitting set), they are multiplied together to obtain the relative posterior probability p(D|T) for that diagnosis, as shown above. p(D|T) for each diagnosis is then transformed to a final relative weight by the possible application of an “operation penalty.” The operation penalty is applied if the diagnosis involves operation violations, which are operations which are involved in both passing and failing tests, thus implying an inconsistency in the test results. In those cases, the weight of that diagnosis is penalized, or reduced, to reflect that inconsistency.
To determine if the operation penalty, which is typically a single scalar factor less than one sometimes called the “input variability,” is to be applied to the weight of a particular candidate diagnosis, each failing test Ti is analyzed in turn. Generally speaking, the input variability is a measure of the expected variability of results over multiple uses of the same operation. In other words, the more repeatable the results of a particular operation, the lower the input variability. Each operation of Ti that involves a component of the candidate diagnosis is identified. If each of those operations is also involved in a passing test, the operation penalty is to be applied. Once a failing test Ti fits this description, the operation penalty is applied only once; only if none of the failing tests indicate application of the penalty is the penalty not applied.
Once the relative weight of each of the candidate diagnoses is calculated, the diagnosis with the highest relative weight is the one determined by the diagnostic system as being the most likely cause of the failures detected by the tests (i.e., the correct diagnosis).
In some alternative versions, no assumption is made that the probability of a test failing, given the failure of a particular component, is proportional to the utilization of that component by the test. In that case, the relative weight of a candidate diagnosis, given a set of test results, may be defined as follows:Weight(D,T)=p(D)*minimum(αi, i=1 to N), wherein                αi=1—utilization of component Cj by test i, where i is a passing test, and Cj is a member of D, or        αi=1 where i is a failing test, and        N=the total number of passing and failing tests        
Therefore, the maximum utilization of a component in any passing test is used to alter the weight of the candidate diagnosis. Also, the actual utilization of a component in a failing test is not considered.
In the case of a multiple-component diagnosis, p(D) is sometimes replaced by the product of the prior probabilities of each component Cj. The utilization factor may be replaced, for example, by the minimum or average utilization for each component Cj.
Additionally, systems under test that include subcomponents of components may be diagnosed. Since a component is viewed as the smallest portion of the system that may be replaced upon failure, the failure of any subcomponent of a component would necessitate replacement of the entire component. In this situation, the weight of each component is calculated separately. Afterward, the relative weight of the component as a whole is then the maximum of the weights calculated for each of the component's subcomponents separately. Subcomponents residing outside a defined hitting set or candidate diagnosis are not considered in determining the weight of that diagnosis.
Although diagnostic systems of this type have been extremely useful in many cases, some systems exhibit failures that have proven difficult to diagnose. For example, some radio-frequency (RF) devices exhibit complex faults that often result in seemingly conflicting passing and failing test results, resulting in operation penalties being assessed for multiple candidate diagnoses, often causing those diagnoses to be equally weighted. Such weighting leads to reduced guidance concerning which components of the system to replace first. Additionally, some tests utilizing a particular operation may pass or fail intermittently due to the uncertain nature of the specific measurement involved with those tests, thereby skewing the weighting.
From the foregoing, a need exists for a more reliable method of diagnosing faults in systems that is able to distinguish between candidate diagnoses when complex faults, often resulting in confusing and conflicting test results, are involved.