The present invention pertains to support of products and pertains particularly to validation of probabilistic troubleshooters and diagnostic systems.
Currently, it is highly expensive for manufacturers to diagnose the systems of their customers. Automation of this process has been attempted using probabilistic troubleshooters and other diagnostic systems. Some of these systems are based on Bayesian networks.
One troubleshooter based on Bayesian networks is described by Heckerman, D., Breese, J., and Rommelse, K. (1995), Decision-theoretic Troubleshooting, Communications of the ACM, 38:49-57 (herein xe2x80x9cHeckerman et al. 1995xe2x80x9d).
In scientific literature Bayesian networks are referred to by various names: Bayes nets, causal probabilistic networks, Bayesian belief networks or simply belief networks. Loosely defined Bayesian networks are a concise (acyclic) graphical structure for modeling probabilistic relationships among discrete random variables. Bayesian networks are used to efficiently model problem domains containing uncertainty in some manner and therein lies their utility. Since they can be easily modeled on a computer, they are the subject of increasing interest and use in automated decision-support systems, whether for medical diagnosis, automated automotive troubleshooting, economic or stock market forecasting or in other areas as mundane as predicting a computer user""s likely requirements.
In general, a Bayesian network consists of a set of nodes representing discrete-valued variables connected by arcs representing the causal dependencies between the nodes. A set of conditional probability tables, one for each node, defines the dependency between the nodes and its parents. And, nodes without parents, sometimes called source nodes, have associated therewith a prior marginal probability table. For specific applications the data for the probability tables for all other nodes are provided by what is termed domain experts in whatever field is being modeled. This involves assigning prior probabilities for all nodes without parents, and conditional probabilities for all nodes with parents. In diagnostic Bayesian networks nodes can represent causes, or outcomes of actions and questions. In very large diagnostic Bayesian networks, most of the events are very rare with probabilities in the range of 0.001 to 0.000001. But, since a primary goal of a computer decision support system is to provide decisions as accurate as is possible, it is imperative that the domain experts provide probabilistic information that is highly reliable and their best estimate of the situation.
Bayesian networks provide a way to model problem areas using probability theory. The Bayesian network representation of a problem can be used to provide information on a subset of variables given information on others. A Bayesian network consists of a set of variables (nodes) and a set of directed edges (connections between variables). Each variable has a set of mutually exclusive states. The variables together with the directed edges form a directed acyclic graph (DAG). For each variable v with parents w1, . . . , wn, there is defined a conditional probability table P(v|w1, . . . , wn). Obviously, if v has no parents, this table reduces to the marginal probability P(v).
Bayesian networks have been used in many application domains with uncertainty, such as medical diagnosis, pedigree analysis, planning, debt detection, bottleneck detection, etc. However, one of the major application areas has been diagnosis. Diagnosis (i.e., underlying factors that cause diseases/malfunctions that again cause symptoms) lends itself nicely to the modeling techniques of Bayesian networks.
In the prior art, validation of probabilistic troubleshooters and diagnostic systems has been highly unstructured and time-consuming. Basically, a domain expert tests a new system by experimenting. The domain expert would try out different sequences with the system by answering suggested steps in different ways. The domain expert tracks coverage of the possible cases. This is very difficult because generally there are thousands of possible cases. The result is the domain expert""s validation either covers only a very small part of the possible cases, or in order to cover a larger number of cases, validation is extremely time-consuming.
Currently no software testing programs exist for validation of probabilistic troubleshooters and diagnostic systems. The technique used by software testing programs to test functional software (known as an application) is as follows. In a recording phase, an application is monitored and all interactions and responses are saved into xe2x80x9ccasesxe2x80x9d. After an application is possibly modified, the cases are tested during a testing phase. In the testing phase the interactions are regenerated on the new (possibly modified) application, and the responses are compared to the recorded responses in the case. If there is a deviation, the case is labeled a failure. If the response is identical on the entire case, the case is labeled a success. Software testing programs generally also record statistics on successes, failures, and allow a user to trace failures. Some software testers also can generate random interactions with the application (so the testing program generates the interactions).
In accordance with a preferred embodiment of the present invention, a probabilistic diagnostic system is validated. A diagnostic sequence is generated from a diagnostic model. The diagnostic sequence is evaluated to determine whether the diagnostic sequence provides an acceptable resolution to a problem. This is repeated for additional diagnostic sequences from the diagnostic model. It is determined whether at least a predetermined number of diagnostic sequences provide an acceptable resolution. When it is determined that a the predetermined number of diagnostic sequences provide an acceptable resolution, the diagnostic model is accepted.
In the preferred embodiment, when it is determined that not at least the predetermined number of diagnostic sequences provide an acceptable resolution, a new diagnostic model is generated. Diagnostic sequences previously evaluated for the diagnostic model are checked to see whether these diagnostic sequences provide acceptable resolutions in the new diagnostic model. When it is determined that not at least the predetermined number of diagnostic sequences provide an acceptable resolution, a new diagnostic model is generated. For diagnostic sequences previously evaluated for the diagnostic model, a check is made to see whether these diagnostic sequences provide acceptable resolutions in the new diagnostic model. When it is determined that the diagnostic sequences provide acceptable resolutions in the new diagnostic model, additional diagnostic sequences are tested to determine whether, for the new diagnostic model, at least the predetermined number of diagnostic sequences provide an acceptable resolution. When it is determined that the diagnostic sequences already checked do not provide acceptable resolutions in the new diagnostic model a new revised diagnostic model is generated.
In the preferred embodiment, a case generator comprising a first diagnostic engine and a second diagnostic engine are used to generate the diagnostic sequences. The second diagnostic engine selects a cause. The first diagnostic engine suggests a best next step. The first diagnostic engine does not know the cause selected by the second diagnostic engine. the second diagnostic engine selects an answer to the best next step. The answer is consistent with the cause previously selected. This is repeated until the problem is resolved or until the first diagnostic engine is unable to suggest a best next step.
The second diagnostic engine selects each cause using a random s process. Alternatively, cases can be generated by traversing all possible sequences and selecting those that fulfills one of the three criteria: length of diagnostic sequence, cost of performing diagnostic sequence, and diagnostic sequences that failed to solve the problem. A history module can be used to ensure constant improvement of models by allowing updated models to be compared with earlier accepted sequences.
Alternatively, the case generator can be implemented using a single diagnostic engine.
Various statistics can be displayed by the case generator and/or by a case evaluator that performs the evaluation of the diagnostic system. The results of the testing can be stored in a history module. The history module stores a library of diagnostic sequences. Information about each diagnostic sequence includes which model versions were tested with the diagnostic sequence and any results of testing performed with the diagnostic sequence.
In one preferred embodiment of the present invention, the probabilistic diagnostic system is based on Bayesian networks.
The preferred embodiment of the present invention allows for the validation of the ability of probabilistic troubleshooters and other diagnostic systems to generate a sequence of questions or tests. The preferred embodiment of the present invention allows for validation of the ability of such a system to reach a conclusion about the likelihood of an underlying problem, diagnosis, or cause, based on the responses to the sequence of questions or tests.
Specifically, cases (diagnostic sequences) that reflect the system""s model of the probabilistic relationships among problems, questions, and possible answers are validated. Also validated is the ability to test diagnostic accuracy using such cases, and the ability to quickly measure the effects of making changes to the underlying model using previously recorded cases.