A belief network is a representation of the probabilistic relationships among distinctions about the world. A distinction in a belief network can take on a set of values and are thus called variables. A belief network is expressed as an acyclic, directed graph, where the variables correspond to nodes and where the relationships between the nodes correspond to arcs. FIG. 1 depicts an example belief network 101. The belief network 101 contains three variables, x.sub.1, x.sub.2, and x.sub.3, which are represented by nodes 102, 106, and 110, respectively. Also, the example belief network 101 contains two arcs 104 and 108. Associated with each variable in a belief network is a set of probability distributions. Using conditional probability notation, the set of probability distributions for a variable can be denoted by p(x.sub.i .vertline.pa(x.sub.i)), where "p" refers to the probability distribution, and where "pa(x.sub.i)" denotes the parents of variable x.sub.i. Thus, this expression reads as follows, "the probability distribution for variable x.sub.i given the parents of x.sub.i." For example, x.sub.1 is the parent of x.sub.2. The probability distributions specify the strength of the relationships between variables. For instance, if x.sub.1 has two states (true and false), then associated with x.sub.1 is a single probability distribution p(x.sub.1) and associated with x.sub.2 are two probability distributions p(x.sub.2 .vertline.x.sub.1 =t) and p(x.sub.2 .vertline.x.sub.1 =f).
An important aspect of belief networks is the concept of dependence. Sets of variables x and y are said to be conditionally independent, given a set of variables z, if the probability distribution for x given z does not depend on y. That is, if p(x.vertline.z,y)=p(x.vertline.z), x and y are conditionally independent given z. If z is empty, however, x and y are said to be "independent" as opposed to conditionally independent. If x and y are not conditionally independent given z, then x and y are said to be conditionally dependent given z.
The arcs in a belief network convey dependence between nodes. When a belief network has an arc from a first node to a second node, the probability distribution of the second node depends upon the value of the first node. For example, belief network 101 contains an arc from node 102 to node 106, and therefore, node 106 is said to be dependent on node 102. Just like the presence of arcs in a belief network conveys dependence, the absence of arcs in a belief network conveys conditional independence. For example, node 102 and node 110 are conditionally independent given node 106. That is, the values of nodes 102 and 110 are conditionally independent if the value of node 106 is known, the condition being the observation of node 106. However, two variables indirectly connected through intermediate variables are dependent given lack of knowledge of the values ("states") of the intermediate variables. Therefore, if the value for x.sub.2 is unknown, x.sub.1 and x.sub.3 are dependent.
FIG. 2 depicts an example belief network for troubleshooting automobile problems. The belief network of FIG. 2 contains many variables 202-234, which relate to whether an automobile will work properly, and arcs 236-268 connecting the variables. A few examples of the relationships between the variables follow: For the radio 214 to work properly, there must be battery power 212 (arc 246). Battery power 212, in turn, depends upon the battery working properly 208 and a charge 210 (arcs 242 and 244). The battery working properly 208 depends upon the battery age 202 (arc 236), and the charge 210 of the battery depends upon both the alternator 204 working properly (arc 238) and the fan belt 206 being intact (arc 240).
The automobile troubleshooting belief network also provides a number of examples of conditional independence and conditional dependence. Specifically, the nodes operation of the lights 216 and battery power 212 are dependent, and the nodes operation of the lights 216 and operation of the radio 214 are conditionally independent given battery power. The concepts of conditional dependence and conditional independence can be expressed using conditional probability notation. For example, the operation of the lights 216 is conditionally independent of the radio 214 given battery power. Therefore, the probability of the lights 216 working properly given both the battery power 212 and the radio 214 is equal to the probability of the lights working properly given the battery power alone: P(Lights.vertline.Battery Power, Radio)=P(Lights.vertline.Battery Power). An example of a conditional dependence relationship is the probability of the lights working properly 216 is conditionally dependent on the radio 214 given the battery 208. Therefore, the probability of the lights 216 working properly given both the radio 214 and the battery 208 is not equal to the probability of the lights given the battery alone: P(Lights.vertline.Radio, Battery).noteq.P(Lights.vertline.Battery).
There are two conventional approaches for constructing belief networks. Using the first approach ("the knowledge-based approach"), a person known as a knowledge engineer first interviews an expert in a given field to obtain the knowledge of the expert about the expert's field of expertise. During this interview, the knowledge engineer and the expert first determine the distinctions of the world that are important for decision making in the expert's field of expertise. These distinctions correspond to the variables of the domain of the belief network. The "domain" of a belief network is the set of all variables in the belief network. Then, to complete the belief network, the knowledge engineer and the expert determine the dependencies among the variables (the arcs) and the probability distributions that quantify the strengths of the dependencies.
In the second conventional approach for constructing a belief network ("the data-based approach"), the knowledge engineer and the expert first determine the variables of the domain. Next, data is accumulated for these variables, and an algorithm is applied that creates a belief network from this data. The accumulated data comes from real-world instances of the domain, which are real-world instances of decision making in a given field.
A method for generating a belief network that is an improvement over these conventional approaches is described in pending U.S. patent application Ser. No. 08/240,019 (now U.S. Pat. No. 5,704,018 issued Dec. 30, 1997), entitled "Generating Improved Belief Networks," assigned to a common assignee, which is hereby incorporated by reference. This improved method uses both expert knowledge and accumulated data to generate a belief network.
Regardless of the approach used for constructing a belief network, after the belief network has been constructed, the belief network becomes the engine for a decision-support system. The belief network is converted into a computer-readable form, such as a file, and input into a computer system. Then, the computer system uses the belief network to perform probabilistic inference by determining the probabilities of variable states given observations, to determine the benefits of performing tests, and ultimately to recommend or render a decision. Consider an example where a decision-support system uses the belief network of FIG. 2 to troubleshoot automobile problems. If the engine for an automobile did not start, the decision-support system may request an observation of whether there was gas 224, whether the fuel pump 226 was in working order by performing a test, whether the fuel line 228 was obstructed, whether the distributor 230 was working, and whether the spark plugs 232 were working. While the observations and tests are being performed, the belief network assists in determining which variable should be observed next.
Although belief networks are quite useful in decision-support systems, belief networks require a significant amount of storage. For example, in the belief network 300 of FIG. 3A, the value of nodes x and y causally influences the value of node z. In this example, nodes x, y, and z have binary values of either 0 or 1. As such, node z maintains a set of four probabilities, one probability for each combination of the values of x and y, and stores these probabilities into a table 320 as shown in FIG. 3B. When performing probabilistic inference, it is the probabilities in table 320 that are accessed. As can be seen from table 320, only the probabilities for z equaling 0 are stored; the probabilities for z equaling 1 need not be stored as they are easily derived by subtracting the probability of when z equals 0 from 1. As the number of parents of a node increases, the table in the node that stores the probabilities becomes multiplicatively large and requires a significant amount of storage. For example, a node having binary values with 10 parents that also have binary values requires a table consisting of 1,024 entries. And, if either the node or one of its parents has more values than a binary variable, the number of probabilities in the table increases multiplicatively.
In addition to requiring a significant amount of storage, when performing probabilistic inference, the table 320 needs to be accessed and a look-up performed to obtain the appropriate probability for the given values of the parent nodes. Since the table containing the probabilities may be quite large, the look-up in such a table requires a significant amount of processing time. That is, if the probability is stored in the n.sup.th entry, n operations need to be performed to locate the entry.
To improve the storage of probabilities in a belief network node, some conventional systems use a tree data structure. A tree data structure is an acyclic, undirected graph where each vertex is connected to each other vertex via a single path. The graph is acyclic in that there is no path that both emanates from a vertex and returns to the same vertex, where each edge in the path is traversed only once. FIG. 3C depicts an example tree data structure 330 that stores into its leaf vertices 336-342 the probabilities shown in table 320 of FIG. 3B. Assuming that a decision-support system performs probabilistic inference with x's value being 0 and y's value being 1, the following steps occur to access the appropriate probability in the tree data structure 330: First, the root vertex 332, vertex x, is accessed, and its value determines the edge or branch to be traversed. In this example, x's value is 0, so edge 344 is traversed to vertex 334, which is vertex y. Second, after reaching vertex y, the value for this vertex determines which edge is traversed to the next vertex. In this example, the value for vertex y is 1, so edge 346 is traversed to vertex 338, which is a leaf vertex. Finally, after reaching the leaf vertex 338, which stores the probability for z equaling 0 when x=0 and y=1, the appropriate probability can be accessed.
As compared to a table, a tree is a more efficient way of storing probabilities in a node of a belief network, because it requires less space. However, tree data structures are inflexible and do not adequately represent all of the relationships between nodes. For example, because of the acyclic nature of tree data structures, a tree cannot be used to indicate some types of equality relationships where multiple combinations of the values of the parent vertices have the same probability (i.e., refer to the same leaf vertex). This inflexibility requires that multiple vertices must sometimes store the same probabilities, which is wasteful. It is thus desirable to improve belief networks.