This invention relates generally to data modeling and analysis such as probabilistic graphical models, and more particularly to variational inference engines for such models.
Data modeling has become an important tool in solving complex and large real-world computerizable problems. Applications of data modeling include data compression, density estimation and data visualization. A data modeling technique used for these and other applications is probabilistic modeling. It has proven to be a popular technique for data modeling applications such as speech recognition, vision, handwriting recognition, information retrieval and intelligent interfaces. One framework for developing such applications involves the representation of probability distributions as directed acyclic graphs, which are also known as Bayesian networks, belief networks, and probabilistic independence networks, among other terms.
In a probabilistic model, there are a number of variables. Each variable is represented as a node in a directed acyclic graph. An example of such a graph is shown in the diagram of FIG. 2. The graph 200 includes nodes 202, 204, 206, 208 and 210, labeled as X1, X2, X3, X4 and X5, respectively. Each node corresponds to a variable, which may correspond to either observed or observable data, or unobserved or unobservable data. For example, if the graph 200 were to correspond to a model for factors involved in starting an automobile, the observed variables might include the starting or otherwise of the engine, while the unobserved variables could include the presence of absence of fuel in the tank and the state of the battery.
The joint distribution over the variables is expressed as the product of a conditional distribution for each node, conditioned on the states of its parents in the graph,       P    ⁢          (                        X          1                ,        …        ⁢                  xe2x80x83                ,                  X          M                    )        =            ∏              i        =        1            M        ⁢          xe2x80x83        ⁢          P      ⁢              (                                            X              i                        ❘                                          pa                i                            "AutoLeftMatch"                                )                    
where pai denotes the parents of Xi. A specific model is determined by the structure of graph, as well as the choice of the conditional distributions P(Xi|pai). For example, given the graph 200 of FIG. 2, the factorization is
P(X1, X2, X3, X4, X5)=P(X1)P(X2)P(X3|X1, X2)P(X4|X2)P(X5|X3, X4).
Because some of the variables are unobservable, to effectively use the model represented by the graph, it is necessary to infer the corresponding posterior distribution of at least a subset of the unobservable variables. After this is accomplished, the posterior distribution can then be used to make predictions based on the model. However, exact solutions of probabilistic models are generally intractable for all but the simplest examples. Therefore, approximation schemes are used to approximate the posterior distributions. Such approximation schemes generally fall into one of three classes: (1) Laplace""s method and similar semi-analytic approximations; (2) Markov chain Monte Carlo methods, such as Gibbs sampling; and, (3) variational methods.
The last of these approximation schemes, variational methods, generally involve the introduction of a distribution that provides an approximation to the true posterior distribution. However, for each model that is to be approximated, researchers must painstakingly work out the mathematics necessary to apply variational inference, and then develop special-purpose computer code to implement the resulting variational algorithm. This can be costly, from both a time and a monetary perspective, and thus limits the usefulness of variational inference as a manner by which to develop usable probabilistic models. For this and other reasons, there is a need for the present invention.
The invention relates to a variational inference engine for probabilistic graphical models. The engine allows a user to design, implement and solve broad classes of models without recourse to mathematical analysis or computer coding. A model, for example, can be specified using a scripting language, or by the user drawing a graph of the probability distribution using a graphical user interface. The engine determines the posterior distribution, and thus allows the resulting probabilistic model to be used for prediction purposes.
In one embodiment, a computer-implemented method includes inputting a specification for a model that has observable variables and unobservable variables. The specification includes a functional form for the conditional distributions of the model, and a structure for a graph of model that has nodes for each of the variables. The model is usually such that an exact posterior distribution is intractable. The method determines a distribution for the unobservable variables that approximates the exact posterior distribution, based on the structure for the graph of the model, as well as the functional form for the conditional distributions of the model. This distribution is then output by the method.
As can be appreciated by those of ordinary skill within the art, the approach outlined herein can be extended to include the possibility of combining sampling methods, such as Markov chain Monte Carlo (e.g., Gibbs sampling) and exact methods along with variational methods, so that the engine could employ a combination of two or three different approaches to solve a particular model.
The invention includes computer-implemented methods, machine-readable media, computerized systems, and computers of varying scopes. Other aspects, embodiments and advantages of the invention, beyond those described here, will become apparent by reading the detailed description and with reference to the drawings.