Troubleshooting is the process of diagnosing and repairing a system that is behaving abnormally. Diagnostic and repair actions may incur costs, and traditional troubleshooting algorithms are directed to minimize the costs incurred until the system is repaired.
System failures are prevalent in practically all the engineering fields, including automobiles, robots, information systems, and computer hardware. As systems become more complex, failures often become more common and maintenance costs tend to increase. As a result, automated diagnosis has been studied in the artificial intelligence field for several decades, with substantial progress and successful applications in space crafts, satellite decision support systems, automotive industry and spreadsheets. The output of the diagnosis algorithms is a set of possible diagnoses, where each possible diagnosis is an explanation of the observed system failure. Model-based diagnosis (MBD) is a common approach for diagnosis that uses a model of the diagnosed system to infer diagnoses explaining the observed system failure.
Diagnosis, and in particular root-cause analysis (a root cause is the set of elements of the diagnosed system that their faulty have caused the system failure), is the task of understanding what has happened in the past that has caused an observed failure. Prognosis is the task of predicting what will happen in the future, and when will future failures occur.
Prognosis techniques have been developed for estimating the remaining useful life of components in a system. In particular, survival analysis is a sub-field of statistics, in which various methods have been developed to generate survival curves of components, which are curves that plot the likelihood of a component to survive (not to fail) as a function of the components usage or age.
The first aspect of the invention is directed to a method that diagnoses system failures more accurately by considering both a system model and the survival curves of the system's constituent components. To motivate this combined approach for diagnosis, consider the following example. Assume that a car does not start, and a mechanic inspection of the car observes that the water level in the radiator is low. A possible explanation—a diagnosis—for why the car does not start is that the radiator is not functioning well. There are, however, alternative diagnoses: the ignition system may be faulty or the battery may be empty. Clearly, considering the age of the battery and the survival curve of batteries of the same type can provide valuable input to the mechanic in deciding the most likely diagnosis and consequent next troubleshooting action.
The second aspect of this invention is directed to a method for automated troubleshooting observed system failures over time.
Conventional automated troubleshooting techniques are based on “Decision Theoretic Troubleshooting (DTT)”, Heckerman et al., Communications of the ACM, 38(3):49-57, 1995. This decision theoretic approach combines planning and diagnosis, and was applied to a troubleshooting application where a sequence of actions may be needed to perform repairs. For example, a vehicle may need to be disassembled to gain access to its internal parts. To address this problem, prior solutions used a Bayesian network for diagnosis and the AO* algorithm (described in “Principles of artificial intelligence”, Nils J Nilsson, Springer, 1982) as the planner. Another solution is using abstractions to improve the efficiency of troubleshooting. Other techniques propose a troubleshooting algorithm aimed at minimizing the breakdown costs, a concept that corresponds roughly to a penalty incurred for every faulty output in the system and for every time step until the system is fixed.
However, DTT and all the above conventional solutions do not incorporate prognosis estimates into the troubleshooting algorithm and did not attempt to minimize costs incurred due to current and future failures.
It is therefore an object of the present invention to provide method for improving decision making for fixing a current fault, while considering also future faults.
It is another object of the present invention to provide method for choosing which action to perform, for fixing system faults.
Other objects and advantages of the invention will become apparent as the description proceeds.