Computer models are frequently used to study the behavior of complex probabilistic systems. When the systems contain many inter-dependent random variables, Markov networks are often used. In a Markov network, nodes of the network represent the possible states of a part of the system, and links between the nodes represent statistical dependencies between the possible states of those nodes.
By the Hammersly-Clifford theorem, from the study of Markov networks, the probability of any set of states at the nodes of the network can be written as the product of compatibility functions between clusters of nodes.
FIG. 1 shows a simple network with four nodes labeled a, b, c, and d. The links between the nodes represent the statistical dependencies between the possible states of the nodes. For the case of pairwise probabilistic interactions between nodes of the network, the overall joint probability of the system can be expressed as the product of compatibility functions for each linked pair of nodes:P(sa, sb, sc, sd)=φab(sa, sb)φbc(sb, sc)φca(sc, sa)φbd(sb, sd),  [1]where φab is the compatibility matrix between nodes a and b, sa is a random variable describing the state at node a, and similarly for the other nodes and variables.
Often, Markov networks for practical applications are very large. For example, an image acquired from a scene by a camera may be represented by a Markov network between all small neighboring patches, or even pixels, of the acquired image. Similarly, the well known “travelling salesman problem” can map onto a Markov network where the maximum probability state corresponds to the shortest path of the salesman's route. This network has as many nodes as cities to be visited. In some Markov networks, the nodes can represent measured input signals, such as visual input data. Markov models are also extensively used in speech recognition systems.
To analyze the probabilistic system modeled by a Markov network, one typically wants to find the marginal probabilities of certain network variables of interest. (The “marginal probability” of a variable signifies the probability of that variable ignoring, the state of any other network variable.) For example, it may be useful to examine the probability of a variable that represents an underlying explanation for some measured data, such as the probability of particular words used to vocalize particular speech sounds. To find those probabilities, the Markov network is marginalized over all the other variables in the network. This gives the probability of the variable representing the explanation, given the measured input data values. This marginalization is thus a form of inference.
One may also want to find states of the nodes, which maximize the network probabilities. For example, for the Markov network corresponding to the travelling salesman problem, it is desired to find the state at each node which maximize the probability of the Markov network. These states, which minimize the length of the salesman's route, are known as the maximum a posteriori probability (MAP) states.
In the example of FIG. 1, it is possible to determine the marginal probability P(sa) of the variable at node a by summing the random values at nodes b, c, and d:                               P          ⁡                      (                          s              a                        )                          =                              ∑                                          s                b                            ,                              s                c                            ,                              s                d                                              ⁢                                                    ϕ                cb                            ⁡                              (                                                      s                    a                                    ,                                      s                    b                                                  )                                      ⁢                                          ϕ                bc                            ⁡                              (                                                      s                    b                                    ,                                      s                    c                                                  )                                      ⁢                                          ϕ                ca                            ⁡                              (                                                      s                    c                                    ,                                      s                    a                                                  )                                      ⁢                                                            ϕ                  bd                                ⁡                                  (                                                            s                      b                                        ,                                          s                      d                                                        )                                            .                                                          [        2        ]            
In general, especially for large networks, these marginal probabilities are infeasible to determine directly. The joint sum over all possible states of all the nodes can be of too high a dimension to sum numerically, particularly when the network has closed loops.
FIGS. 2a-b show examples of Markov networks with many loops for which it is difficult to find either the marginal probability at a node, or the state of the node which maximizes the overall probability of the Markov network. Both networks are in the form of lattices, which are commonly used to describe the joint probabilities of variables spatially, distributed over two dimensions. FIG. 2a shows a rectangular lattice, and FIG. 2b shows a triangular lattice. These type of lattice networks are used to model many systems.
Techniques to approximate the marginal probabilities for such structures are known, but these techniques are typically very slow. Simulated annealing can be used, or Gibbs sampling, see Geman et al. “Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images,” IEEE Pattern Analysis and Machine Intelligence, 6:721-741, 1984. Another class of approximation techniques are variational methods, see Jordan, “Learning in graphical models,” MIT Press, 1998. However, these methods require an appropriate class of variational approximation functions for a particular problem. It is not obvious which functions, out of all possible ones, to use for the approximations.
For the special case of Markov networks that form chains or trees, there is a local message-passing method that calculates the marginal probabilities at each node, see Pearl, “Probabilistic reasoning in intelligent systems: networks of plausible inference,” Morgan Kaufmann, 1988. The later method is now in widespread use, and is equivalent to the “forward-backward” and Viterbi methods for solving one dimensional Hidden Markov Models (HMM), and to Kalman filters and their generalization to trees, see Luettgen et al. in “Efficient multiscale regularization with applications to the computation of optical flow,” IEEE Trans. Image Processing, 3(1):41-64, 1994. This message-passing method gives the exact marginal probabilities for any Markov network that does not have loops. This is referred to as the “standard” belief propagation, or message-passing method below.
Unfortunately, many Markov networks of practical interest do contain loops. For example, an image, modeled as a Markov network of local image patches connected to their nearest neighbors, gives a lattice structured Markov network as shown in FIGS. 2a-b, also called a Markov random field. This type of network contains many loops.
Another method for inference in Markov networks applies the local message-passing rules derived for trees and chains in a network, even though the network may contain loops, see Weiss, “Belief propagation and revision in networks with loops,” Technical Report 1616, MIT AI Lab, 1997. This is referred to as the “loopy” belief propagation method in the description below, although it should be clearly understood that the “loopy” method is nothing more than the “standard” belief propagation method applied to a network with loops. When such a procedure converges, it can yield an approximate determination of the marginal probabilities. However, the loopy method sometimes gives too poor an approximation to the marginal probabilities, and often does not even converge. In the latter case, the approximation gives no single answer for the desired marginal probabilities.
Therefore, it is desired to provide a method for determining marginal probabilities in Markov networks that is both relatively fast and more accurate than the loopy method. Furthermore, it is desired to provide a method for networks with loops that converges more reliably than the prior art loopy belief propagation method.