1. Field of the Invention
The invention relates to a diagnostic system utilizing a Bayesian network model. More particularly, the invention relates to an expert system utilizing a Bayesian network model wherein the link weights of the model are automatically updated based on experiential diagnostic successes.
2. State of the Art
Diagnostic systems, otherwise known as "expert systems" attempt to determine a cause as producing of two or more contemporaneous events. In medical terminology, a diagnostic/expert system attempts to determine the identity of a disease as producing of two or more contemporaneous symptoms. Computer based diagnostic/expert systems are commonplace today and are applied to diagnosing problems in many different areas. For example, such systems are utilized to diagnose diseases, to locate geological formations, and to manage complex apparatus such as nuclear power plants, communications networks, etc.
All expert systems are built around a "knowledge base" of domain specific information and an "inference engine". When an expert system is presented with a problem to solve, the "inference engine" combines information in the knowledge base with information about the problem. The "inference engine" applies its particular type of reasoning methodology to derive conclusions on the basis of the information it has available to it. Diagnostic expert systems provide a diagnosis on the basis of two or more contemporaneous symptoms. (A diagnosis based on a single symptom is either too speculative to be of value or too simple to require an expert system). Expert systems differ according to the organization and type of information stored in the knowledge base and according to the reasoning methodology employed by the "inference engine".
The earliest type of expert system is called a "rule-based" system. In this type of system the knowledge base is made up of a set of condition/action rules in the form "if . . . then". A problem is presented to the system in the form of a set of propositions stated to be known to be true. The inference engine uses Aristotelian logic to deduce non-obvious propositions from the application of rules to conditions input as the problem statement. Rule based reasoning may proceed in two modes: forward chaining or backward chaining. In forward chaining, the conclusions reached from applying rules are added to the set of conditions which are scanned to apply further rules. In backward chaining, the system is presented with a hypothesis for confirmation or denial. The system searches for rules which could satisfy the hypothesis and scans current conditions to determine whether the rule can be applied. If there are no conditions which would support the application of the rule needed, the system scans for other rules which could support the rule needed and then scans for conditions to support the other rule. Both forward and backward chaining are recursively executed until all of the conditions and rules have been examined. Rule based systems become unmanageable as the number of rules and conditions grows large.
An alternative to a rule based system is a "case based system". In a case based system, knowledge is organized according to problems which have been successfully solved. When presented with a problem, the case based system searches for similar problems which have been successfully solved. Case based systems work best when the problems in the knowledge base are organized according to type. However, case based systems are inefficient because they may contain more cases than necessary to solve the problems presented to the system. In addition, maintaining a case based system is labor intensive as new cases must be carefully described and categorized as they are entered into the system. Further, new knowledge may invalidate existing cases in the knowledge base, leading to inconsistencies. Finally, unless a problem matches a case in the database exactly, an algorithm must be used to determine which "imperfect match" is most similar to the given problem. These algorithms are generally ad hoc and designed or tuned by system developers to work on particular problems or in particular domains. There are no general rules on how to evaluate inexact matches and therefore it is difficult to estimate the reliability or problem solving performance of these systems.
It will be appreciated that each of the systems described above relies on a binary logic wherein decisions are made on the assumption of certainty. In other words, in the rule based system, a rule either applies or it doesn't. In a case based system, a case is found or it isn't. This approach works in cases where knowledge of the system has an absolute, syllogistic character. However, for systems about which knowledge is uncertain and for which one must reason in terms of levels of confidence, this approach is inadequate.
Recently, there have been many advances in the use of probabilistic expert systems. These systems make decisions based on probabilities rather than on the basis of binary logic. In one of the earliest applications of probabilities in expert systems, a probability value was attached to each rule in a rule-based expert system. Probabilities were propagated across chains of inference by multiplying probabilities of successive rules in a chain. This method did not work well because it relied on the assumption that antecedent conditions of all the rules of a rule chain are probabilistically independent. This assumption conflicted with a basic rationale of rule-based systems, i.e., the ability to specify each rule as a separate entity, without regard to interactions among rules. In general it was found to be impossible to add probabilities to rule-based systems in a theoretically sound manner without considering interactions among the rules in all possible rule chaining.
An alternative approach which emphasizes theoretically sound and consistent use of probabilities builds the expert system around a knowledge base that is a representation of a probability distribution. In these systems, the inference engine uses a reasoning methodology that computes conditional probability distributions over the underlying joint probability distribution represented in the knowledge base. Conclusions are formulated as probability distributions over the propositions of interest. For example, a probabilistic medical diagnostic system might represent possible diseases with a single DISEASE variable that takes on values that range over a set of diseases:
DISEASE={babesiosis, rheumatoid arthritis, lyme disease}. A problem is presented to the system as a set of values for one or more symptom variables: RASH=false, JOINT.sub.-- PAIN=true . . . TEMPERATURE=101.8. The inference engine reasons by computing a conditional probability distribution for one or more variables that may not be directly observable--normally the unobservable causes of the input symptoms: P(DISEASE.vertline. RASH=false, JOINT--PAIN=true . . . TEMPERATURE=101.8)=(0.1, 0.6, 0.3) thereby assigning the probabilities of taking on diseases defined in the system as babesiosis=0.1, rheumatoid arthritis=0.6, and lyme disease=0.3
The form of problem solving described above is well understood in the field of mathematics and is generally known as "classification". It is also well known that the computation of conditional probability distribution of a class given the features of an instance of an unknown class is an optimal algorithm in the sense that the likelihood of making an erroneous classification is minimized. However, the practical application of this inference methodology has been limited for two reasons. First, it requires that a full joint probability distribution be known and stored in a form that can be manipulated. Second, it has been necessary to perform computations that sum probabilities over nearly all of the points in the full joint probability distribution.
It will be appreciated that in the case of discrete variables, a full joint distribution of probabilities based on N evidential variables will have O(2.sup.N) values. [O(x) is notation for "on the order of x".] A common strategy to reduce the complexity of handling a full joint probability distribution is to assume that all pairs of variables in the distribution are marginally independent. This assumption allows one to specify and store only one marginal probability distribution for each variable in the domain. Moreover, every point in the full distribution can be derived by multiplying marginal distributions for individual variables. This assumption reduces the size of the storage and computation requirements for a distribution to O(N).
U.S. Pat. No. 5,133,046 to Kaplan describes a diagnostic expert system which implements optimal classification algorithm described above by assuming marginal independence between all pairs of variables. The Kaplan system includes a diagnostic module which generates a diagnosis in the form of probability distributions. A state domain is created to define various states of the system to be diagnosed and Bayes theorem is used to quantify the probabilities that certain symptoms are indicative of a particular state. While the Kaplan system is an advance over binary logic expert systems, it does have several shortcomings. First, all of the symptoms indicative of different states in the state domain are assumed to be marginally independent. This reduces the number of calculations needed to define the state domain and also reduces the amount of storage space required for the state domain. However, with few exceptions, it represents a relatively poor approximation of the true probability distribution of the domain. The algorithm will not minimize the likelihood of an erroneous diagnosis if the underlying probability model is inaccurate. Further, the Kaplan system is static in that once the domain is defined, probabilities are not updatable to reflect new knowledge about the system.
As mentioned above, Kaplan uses Bayes theorem to quantify probabilities. Bayesian analysis takes into account conditional probabilities and provides a rule for quantifying confidence (beliefs or probability) based on evidence. Bayes theorem, known as the inversion formula, is listed below as equation (1). ##EQU1## Equation (1) states that the probability of (or belief in) hypothesis H upon obtaining evidence e is equal to the probability (or degree of confidence) that evidence e would be observed if H is true, multiplied by the probability of H prior to learning evidence e (the previous belief that H is true), divided by the probability of evidence e. P(H.vertline.e) is referred to as the posterior probability. P(H) is referred to as the prior probability. P(e.vertline.H) is referred to as the likelihood; and P(e) is a normalizing constant. Bayesian analysis is particularly useful in an expert system because the likelihood can often be determined from experimental knowledge and the likelihood can be used to determine the otherwise difficult to determine posterior probability.
As alluded to above, the inversion formula can be used to quantify confidence based on multiple pieces of evidence. For example, with N pieces of evidence, the inversion formula would take the form shown in equation (2). ##EQU2## It will be appreciated that a full joint distribution of probabilities based on N pieces of evidence will have 2.sup.N values. If, however, it is known that each piece of evidence is independent of the others (marginal independence), the inversion formula can be reduced to the form shown in equation (3) and the distribution can be reduced in size to N number of values. ##EQU3##
Bayesian networks are a more recent representational and computational innovation for reducing the complexity of a discrete joint probability distribution by taking advantage of much less restrictive assumptions of conditional independence among sets of variables. Developed in the late 1980s, the concept of a Bayesian network uses a model of dependent knowledge in the form of a graph. The graph is referred to as a directed acyclic graph (DAG) in which each node represents a random variable and each link represents probabilistic dependence among the linked variables. To reduce the difficulty of modeling, knowledge of causal relationships among variables is used to determine the position and direction of the links. The strength (or weight) of the influences are quantified by conditional probabilities. Prior art FIG. 1 is an example of a Bayesian network DAG and prior art FIG. 1a is a probability matrix indicating the strength of the influences of nodes A and B on node C.
Referring now to FIG. 1, the example Bayesian network has seven nodes, A through G. Each node is connected to at least one other node by a link which is designated as an arrow, the direction of which indicates probabilistic dependence. Thus, node D is dependent upon nodes A and B. Node F is dependent on nodes B and C. Conditional independence among variables in the distribution allow for reduced storage and computational requirements. In FIG. 1a node G is independent of node A given the value of node D. Conditional independence is also represented using siblings in the graph. For example, nodes D and F are independent given the value of node B. Nodes at the tail end of a link are referred to as parents and parents which are not influenced by any other nodes are called root nodes. Each node in the graph represents a variable in the probability distribution. For each root node, the associated variable's marginal distribution is stored. For each non-root node, a probability matrix is created which indicates the conditional probability distribution of that node given the values of its parent nodes.
For example, as shown in FIG. 1a, the value of the variable at node D is related probabilistically to the value of the variables at nodes A and B. FIG. 1a illustrates that the value of the variables at nodes A, B, and D is a binary value, but a range of values could be used with appropriate functions. As shown, the variable at node D takes the value T with a probability of 0.89 when the variables at nodes A and B are T. When the variable at node A is T but the variable at node B is F, the probability that the value of the variable at node D is T drops to 0.85. When both the variables at nodes A and B are F, the probability that the value of the variable at node D is T drops to 0.30. It will be appreciated that for any given state of the parent nodes, the probabilities of the values of the influenced node always sums to one.
In other words, the "knowledge base" of an expert system based on a Bayesian network consists of (1) a DAG including a node for each of the variables in the domain as described above and (2) either a marginal or a conditional probability distribution stored at each node. The "inference engine" of a Bayesian Network expert system uses well known algorithms to compute conditional probabilities for each variable whose value has not been directly observed. As with other expert systems described above, the Bayesian network functions when presented with a problem description that consists of a set of attribute values. Each variable whose value is included in the problem description is instantiated with that value. A problem description need not be known with certainty. Instead attribute/value pairs may themselves be uncertain values with probabilities attached. The impact of observed variable values on the network is viewed as a perturbation which propagates through the network from node to node as the algorithms are run. The algorithms can be distributed across different processors by propagating information in the usual manner across communications media that links the different processors.
At least one software package is available for designing and operating a Bayesian network. The software is known as "HUGIN" which references the Scandinavian myth of a raven god of Odin that brought him news from the whole world, the term later regarded as personifying thought. HUGIN is produced and marketed by Hugin Expert A/S, Niels Jernes Vej 10, Box 8201 DK-9220, Aalborg, Denmark. The HUGIN software includes an editor which is used to specify nodes and possible states of the nodes in a Bayesian network as well as the links between nodes and the conditional probability matrices at each node. The HUGIN software also includes an inference engine and an application program interface which is a library of C-language functions. The HUGIN software supports both marginally and conditionally independent variables, although it is up to the designer to provide the proper specifications for each. For example, the expert system described in Kaplan can be represented as a special case of a Bayesian network in which there is one root node that takes on different values for different faults and the one root is linked to all the children representing different symptoms. Because Kaplan assumes that all children are pairwise independent there would be no links between the children nodes and no more than one parent for each child node in the Bayesian network model. The inference algorithms included in HUGIN would perform the equivalent table lookup and probabilistic computations as described by Kaplan. In addition, the HUGIN software operates the Bayesian network by propagating observed variable values. The HUGIN software does not, however, provide any means for adjusting the conditional probability matrices after the Bayesian network has been designed.
The foregoing discussion mainly concerns the problems related to the computational complexity of inference and storage of joint probability distributions over discrete variables. An even more difficult problem associated with inference over a discrete probability model of a domain is the need for specifying the probability for each of the points in the distribution. In prior art expert systems such as those described by Kaplan or those built using HUGIN, these probabilities are estimated and entered manually by a domain expert. One of the strengths of rule-based expert systems is that the form of the knowledge base abstracts from most of the detail in the domain and closely matches the way that experts think about problems. This is not true for the specification of points in a joint probability distribution. While experts may have good estimates for marginal probabilities it is normally very difficult to estimate point probabilities. This is particularly true for probabilities associated with causal variables that cannot be directly observed and as such are the focus of interest in expert systems.
Bayes inversion formula is a well known method for deriving difficult to observe conditional probabilities from probabilities that are easier to estimate using experimental data and statistical inference. It indicates that evidence from solved problems can be used to estimate the distribution of the domain. However, the inversion formula cannot be applied directly to estimate the probabilities in the knowledge base of a Bayesian network expert system.
U.S. Pat. No. 5,528,516 to Yemini et al. discloses an expert system for managing a communications network which on its face appears to implement a Bayesian network approach. The Yemeni et al. disclosure appears to be derived in part from a confidential unpublished work of the present inventor which forms the basis of the provisional application contained in Appendix A. The Yemeni et al. system fails, however, to fully implement a true Bayesian network for several reasons. First, it assumes a 1:1 correspondence between a set of evidence and a problem. Second, it does not associate a probabilistic value to the relationship between a piece of evidence and a problem. Third, it does not fully address the issue of conditional independence or full independence. While the Yemini et al. disclosure alludes to solutions to these three issues, no real solutions are provided and the disclosed embodiment virtually precludes resolution of these issues. Consequently, Yemini et al. does not teach any method for adjusting the probability distributions at nodes of a Bayesian network.
From the foregoing, it will be appreciated that tools exist for defining a static Bayesian network knowledge base and for carrying out inference by propagating the observed evidence through the network. However, there is no easy way of specifying the probabilities that make up the conditional distributions in a Bayesian network knowledge base. Moreover, it is not possible to automatically update the probabilities specified in the knowledge base as additional evidence is gathered. Thus, there are no Bayesian network expert systems which learn from experience and adjust link weights to more accurately reflect the body of available knowledge.