Bayesian networks have a variety of applications and are used for modeling knowledge in domains such as medicine, image processing, and decision support systems. For example, a Bayesian network can be used to calculate the probability of a patient having a specific disease, given the absence or presence of certain symptoms.
A Bayesian network is a representation of the probabilistic relationships among distinctions about the world. Each distinction, sometimes called a variable, can take on one of a mutually exclusive and exhaustive set of possible states. A Bayesian network can be expressed as a directed acyclic graph (DAG) where the variables correspond to nodes and the relationships (e.g., dependencies) between the nodes correspond to arcs or edges connecting various nodes. When there is an edge between two nodes, the probability distribution of the first node depends upon the value of the second node when the direction of the edge points from the second node to the first node. The absence of edges in a Bayesian network conveys conditional independencies. The DAG is acyclic in that there is no directed path that both emanates from a node and returns to the same node, where each edge in the path is traversed only once.
The variables in a Bayesian network can be discrete or continuous. A discrete variable is a variable that has a finite or countable number of states, whereas a continuous variable is a variable that has an infinite number of states. An example of a discrete variable is a Boolean variable. Such a variable can assume only one of two states: (e.g., “true” or “false”). An example of a continuous variable is a variable that may assume any real value between −1 and 1. Discrete variables have an associated probability distribution. Continuous variables have an associated probability density function (“density”).
Methods for learning Bayesian networks from data have been developed. The learning problem can be considered a classic heuristic-search problem: given a Bayesian network structure, there is a “score” that measures how well that structure fits with the data. The task is to utilize a search algorithm to find a good network structure. Typically, once a good network structure has been identified, it is straight-forward to estimate the corresponding conditional probability distributions. Traditional approaches to learning Bayesian networks typically perform a greedy search though DAG space.
For example, in a conventional approach, the structure of a Bayesian network is a DAG, and at each step of the learning process one considers adding an edge in the DAG. One may also consider deleting or reversing an edge at each step. Typically, the process begins with a DAG containing no edges, and greedily performs these three operators (adding, deleting, or reversing) until a local maximum is reached.
Typically, Bayesian networks are learned or generated using a sequential algorithm. Such sequential algorithms are typically executed with a single processor or thread, which may be slower and less efficient than a parallel algorithm that can be executed concurrently by multiple processors or threads.