This invention relates generally to modeling probabilistic systems, and more particularly, to modeling probabilistic systems using belief propagation in a Markov network.
Computer models are frequently used to study the behavior of complex probabilistic systems. When the systems contain many inter-dependent random variables, Markov networks are often used. In a Markov network, nodes of the network represent the possible states of a part of the system, and links between the nodes represent statistical dependencies between the possible states of those nodes.
By the Hammersly-Clifford theorem, from the study of Markov networks, the probability of any set of states at the nodes of the network can be written as the product of compatibility functions between clusters of nodes.
FIG. 1 shows a simple network with four nodes labeled a, b, c, and d. The links between the nodes represent the statistical dependencies between the possible states of the nodes. For the case of pairwise probabilistic interactions between nodes of the network, the overall joint probability of the system can be expressed as the product of compatibility functions for each linked pair of nodes:
P(Sa, Sb, Sc, Sd)=xcfx86ab(Sa, Sb)xcfx86bc(Sb, Sc)xcfx86ca(Sc, Sa)xcfx86bd(Sb, Sd),xe2x80x83xe2x80x83[1] 
where xcfx86ab is the compatibility matrix between nodes a and b, sa is a random variable describing the state at node a, and similarly for the other nodes and variables.
Often, Markov networks for practical applications are very large. For example, an image acquired from a scene by a camera may be represented by a Markov network between all small neighboring patches, or even pixels, of the acquired image. Similarly, the well known xe2x80x9ctravelling salesman problemxe2x80x9d can map onto a Markov network where the maximum probability state corresponds to the shortest path of the salesman""s route. This network has as many nodes as cities to be visited. In some Markov networks, the nodes can represent measured input signals, such as visual input data. Markov models are also extensively used in speech recognition systems.
To analyze the probabilistic system modeled by a Markov network, one typically wants to find the marginal probabilities of certain network variables of interest. (The xe2x80x9cmarginal probabilityxe2x80x9d of a variable signifies the probability of that variable ignoring, the state of any other network variable.) For example, it may be useful to examine the probability of a variable that represents an underlying explanation for some measured data, such as the probability of particular words used to vocalize particular speech sounds. To find those probabilities, the Markov network is marginalized over all the other variables in the network. This gives the probability of the variable representing the explanation, given the measured input data values. This marginalization is thus a form of inference.
One may also want to find states of the nodes, which maximize the network probabilities. For example, for the Markov network corresponding to the travelling salesman problem, it is desired to find the state at each node which maximize the probability of the Markov network. These states, which minimize the length of the salesman""s route, are known as the maximum a posteriori probability (MAP) states,.
In the example of FIG. 1, it is possible to determine the marginal probability P(sa) of the variable at node a by summing the random values at nodes b, c, and d:                               P          ⁡                      (                          s              a                        )                          =                              ∑                                          s                b                            ,                              s                c                            ,                              s                d                                              ⁢                                                    φ                ab                            ⁡                              (                                                      s                    a                                    ,                                      s                    b                                                  )                                      ⁢                                          φ                bc                            ⁡                              (                                                      s                    b                                    ,                                      s                    c                                                  )                                      ⁢                                          φ                ca                            ⁡                              (                                                      s                    c                                    ,                                      s                    a                                                  )                                      ⁢                                                            φ                  bd                                ⁡                                  (                                                            s                      b                                        ,                                          s                      d                                                        )                                            .                                                          [        2        ]            
In general, especially for large networks, these marginal probabilities are infeasible to determine directly. The joint sum over all possible states of all the nodes can be of too high a dimension to sum numerically, particularly when the network has closed loops.
FIGS. 2a-b show examples of Markov networks with many loops for which it is difficult to find either the marginal probability at a node, or the state of the node which maximizes the overall probability of the Markov network. Both networks are in the form of lattices, which are commonly used to describe the joint probabilities of variables spatially, distributed over two dimensions. FIG. 2a shows a rectangular lattice, and FIG. 2b shows a triangular lattice. These type of lattice networks are used to model many systems.
Techniques to approximate the marginal probabilities for such structures are known, but these techniques are typically very slow. Simulated annealing can be used, or Gibbs sampling, see Geman et al. xe2x80x9cStochastic relaxation, Gibbs distribution, and the Bayesian restoration of images,xe2x80x9d IEEE Pattern Analysis and Machine Intelligence, 6:721-741, 1984. Another class of approximation techniques are variational methods, see Jordan, xe2x80x9cLearning in graphical models,xe2x80x9d MIT Press, 1998. However, these methods require an appropriate class of variational approximation functions for a particular problem. It is not obvious which functions, out of all possible ones, to use for the approximations.
For the special case of Markov networks that form chains or trees, there is a local message-passing method that calculates the marginal probabilities at each node, see Pearl, xe2x80x9cProbabilistic reasoning in intelligent systems: networks of plausible inference,xe2x80x9d Morgan Kaufmann, 1988. The later method is now in widespread use, and is equivalent to the xe2x80x9cforward-backwardxe2x80x9d and Viterbi methods for solving one dimensional Hidden Markov Models (HMM), and to Kalman filters and their generalization to trees, see Luettgen et al. in xe2x80x9cEfficient multiscale regularization with applications to the computation of optical flow,xe2x80x9d IEEE Trans. Image Processing, 3(1):41-64, 1994. This message-passing method gives the exact marginal probabilities for any Markov network that does not have loops. This is referred to as the xe2x80x9cstandardxe2x80x9d belief propagation, or message-passing method below.
Unfortunately, many Markov networks of practical interest do contain loops. For example, an image, modeled as a Markov network of local image patches connected to their nearest neighbors, gives a lattice structured Markov network as shown in FIGS. 2a-b, also called a Markov random field. This type of network contains many loops.
Another method for inference in Markov networks applies the local message-passing rules derived for trees and chains in a network, even though the network may contain loops, see Weiss, xe2x80x9cBelief propagation and revision in networks with loops,xe2x80x9d Technical Report 1616, MIT AI Lab, 1997. This is referred to as the xe2x80x9cloopyxe2x80x9d belief propagation method in the description below, although it should be clearly understood that the xe2x80x9cloopyxe2x80x9d method is nothing more than the xe2x80x9cstandardxe2x80x9d belief propagation method applied to a network with loops. When such a procedure converges, it can yield an approximate determination of the marginal probabilities. However, the loopy method sometimes gives too poor an approximation to the marginal probabilities, and often does not even converge. In the latter case, the approximation gives no single answer for the desired marginal probabilities.
Therefore, it is desired to provide a method for determining marginal probabilities in Markov networks that is both relatively fast and more accurate than the loopy method. Furthermore, it is desired to provide a method for networks with loops that converges more reliably than the prior art loopy belief propagation method.
The present invention provides a method for determining the probabilities of nodes in a network model of a complex probabilistic system. More particularly, the method determines desired marginal or maximum a posteriori probabilities in networks with loops. The method uses a message-passing scheme, appropriate for networks with loops, which is more accurate and typically converges more reliably and in fewer iterations than prior art loopy belief propagation methods.
The invention describes a class of procedures in which computational cost and accuracy can be traded off against each other, allowing a user of the invention to select more computation for more accuracy in a particular application of the invention.
The invention has two major advantages over the prior art loopy belief propagation method. First, the invented method normally gives much more accurate answers for the desired marginal probabilities. Second, the invented method can converge to a single answer in cases where the loopy method does not.
Instead of finding the marginal probability at a node, one embodiment of the invention finds the states at each node which approximately maximize the probability of the entire network. Thus, the invention provides a novel way to approximate both the marginal probabilities and MAP states in Markov networks.
Many Markov network problems of interest are known to be NP-hard problems. A problem is NP-hard when it is intrinsically harder than those problems that can be solved by a Turing machine in nondeterministic polynomial time. When a decision version of a combinatorial optimization problem belongs to the class of NP-complete problems, which includes the traveling salesman problem described above, an optimization version is NP-hard. The invented method yields fast, approximate solutions for some of these very difficult optimization problems.
More particularly, the invention provides a method that determines the probabilities of states of a system represented by a model. The model including nodes connected by links. Each node represents possible states of a corresponding part of the system, and each link represents statistical dependencies between possible states of related nodes. The nodes are grouped into arbitrary sized clusters such that every node is included in at least one cluster. A minimal number of marginalization constraints to be satisfied between the clusters are determined.
A super-node network is constructed so that each cluster of nodes is represented by exactly one super-node. Super-nodes that share one of the marginalization constraints are connected by super-links. The super-node network is searched to locate closed loops of super-nodes containing at least one common node. A normalization operator for each closed loop is determined, and messages between the super-nodes are defined.
Initial values are assigned to the messages, and the messages between super-nodes are updated using standard belief propagation. The messages are replaced by associated normalized values using the corresponding normalization operator, and approximate probabilities of the states of the system are determined from the messages when a termination condition is reached.