One of the problems in complex systems, such as computer systems and other systems using computers, is their increasing complexity. As a result, ever more complex knowledge, experience and training are needed to operate, maintain and service these systems. Many of these complex systems have fault isolation software and sensor hardware that find faults in the system in a fixed, non-adaptable way. The failure modes of these systems are only partially known by the design engineers at the time the system is first built. This fixed logic is built into the system before field experience is gained. Each new system that is designed, therefore, needs custom logic that will fault isolate its components and topology, requiring re-writing of software as engineers "re-invent" each new system's logic for fault isolation.
To solve this problem, artificial intelligence techniques have been applied to isolate faults in a "target system", (the system being fault isolated). This allows for a more flexible approach to fault isolation, as the fault isolator can be generalized so as not to be restricted to finding faults in only one specific type of target system. For example, target systems can be a computer system, a communications system, or even the human body.
One of the artificial intelligence techniques that has been used is a rule-based approach, in which there are a number of stored rules in an "IF premise, THEN conclusion" format. These rules are created by a human who is expert in the target system fault diagnosis (based on past experience). The rules attempt to relate possible states of the target system with fault diagnoses. In a fault isolation episode, the states of the target system are compared with the premise of each of the rules, while the conclusions of the rules provide the fault diagnoses.
Known rule-based fault isolation devices use either forward or backward chaining in what is known as the "inference process". This is the process of inferring conclusions from given
a knowledge base and from data. The process of inferencing involves matching the knowledge base with the data an producing conclusions.
One of the problems in using either forward or backward chaining is that only single conclusion is presented to the user of the fault isolation device, since the goal of forward and backward chaining is to reach a single conclusion. This presents obvious difficulties if the conclusion that is presented turns out not to isolate the fault in the target system.
Another problem in rule-based systems is in the matching of the rules in the knowledge base with the data. Rule-based systems typically require an exact match of the rules in the knowledge base with the data. Unfortunately, many target systems being fault isolated do not provide data that exactly matches the premises created by the expert and stored in the knowledge base This prevents the fault isolation device from solving situations that are very close to situations accommodated by the rules, but are not exactly the same.
A further deficiency of known rule-based systems is that they typically do not learn from their own fault isolation experiences in order to provide a more accurate current fault isolation. In other words, the information in the knowledge base is not updated to reflect the success or failure of a conclusion in isolating the fault of a target system, to thereby provide an accurate indication of how much confidence should be placed in the rule that was tried in isolating the fault.
One of the features of rule-based systems is that they contain rules in which a certain degree of confidence in their correctness can be placed. In other words, for a given state of the target system, there is a certain possibility for each rule that this particular rule will provide a correct conclusion. This is known as a rule possibility. What such rule-based systems do not take into account, however, is the prior history of the target system. For example, two rules may present conclusions that point to two different components of the target system as the fault, and prior history with the target system shows that one of these components has failed fifty times, while the other component has only failed once. A human expert would certainly consider this "probability of failure" information to be relevant, but typical rule-based systems do not take probability into account. Instead, they rely on rule possibility alone in presenting conclusions.
There is a need for a method and a device that will isolate faults in a target system and overcome the deficiencies in prior fault isolation devices, and thereby provide more accurate diagnoses of faults in a target system.