A fundamental characteristic of graphical modeling is modularity. A complex system can be constructed by the combination of simpler parts, such as sub-graphs, conditional probability tables (CPTs), arcs and nodes. Probability theory can be used to provide the link for the combination of parts, affording a mathematical interface between models, data and information. Graphical modeling further provides an intuitively appealing formalism and interface for the construction and manipulation of sets of variables or data structures acting within a set of interactive models. It also provides an intuitive visual representation of an asset monitor for a user through the use of data, graphical representations, probability theory and visual cueing.
Many of the classical multivariate probabilistic systems studied in fields such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of a general graphical model formalism. Some examples include mixture models, directed graphical models, factor analysis, hidden Markov models, Kalman filters and Ising models. The graphical model framework provides a way for a human to view all of these systems as instances of a common underlying formalism. The graphical model formalism provides a natural framework for the design of new systems. This allows joint distribution to factor into a product of conditional distributions. The graphical model structure may indicate direct dependencies among random variables and may indicate conditional independencies among give values of influencing variables.
Through computer science, the advent of artificial intelligence (AI) has presented many decision support systems to aid users in the understanding and control of complex systems. Decision support through the use of computers and models has greatly increased a human's ability to control and evaluate real world systems. Decisions and recommendations can be rendered by either of, or both a computer and human. Thus the human can become better informed, increasingly aware and controlling of situations surrounding multivariate systems that would otherwise overwhelm the human capacity of understanding and judgment.
During the creation of decision support systems, computer scientists seek to provide a framework to supply decisions or decision recommendations with the greatest possible accuracy and speed. The decision support system is constructed in an effort to allow for increased awareness, increased accuracy, faster analysis, and increased control in the overall operation of a situation or system under observation. Applications of decision support systems include medical diagnosis, troubleshooting of computer networks, asset monitoring, automotive trouble-shooting and monitoring, electro-mechanical troubleshooting, monitoring of complex systems or the troubleshooting of complex systems, or gene regulatory networks. Decision support systems calculate decisions based upon recognizable, obvious, non-obvious, observable, unobservable, hidden, latent or unrecognizable criteria while incorporating their probabilistic, logistical and mathematical relationships.
Why Use a Bayesian Network?
As an example of an application of graphical models, consider the case of a Bayesian network used in manufacturing. By employing an observer, an asset monitor, or an automated system for the collection of data through the observation of a situation under observation, data for a model system can be collected from sources observing the situation to be modeled with a graphical model. This data can be organized and provided for a domain expert for use in the generation a Bayesian network. Data may be collected and used in real time and may be stored and used for historical analysis. Useful predictions can thus be made on the output of system being modeled, including yield characteristics, failure predictions and the ability to control quality. This increased ability of production control and monitoring, afforded through the use of the Bayesian network, can be realized as increased productivity, decreased costs, decreased waste of material, decrease use of energy and faster production.
Mathematical Background
If an experiment is performed or observations are made on a situation, an acceptable probability of an event may be calculated. If this event, event A, occurs with a given frequency, the value of the probability of event A's occurrence becomes acceptable and usable with a degree of confidence. We can make similar observations of other events within the same situation or within a set of observables. Suppose during our observations that a set of other events occur, including an event B. This leads to a question regarding the occurrence of the event B. How does the knowledge of the probability of event A revise the probability of event B?
For example, given a set of one million bicycles painted blue or red, we have no idea as to the specific amount of blue bicycles. Some bicycles are equipped with an apple seat or a banana seat. Knowing how likely a selection from the set of bicycles will be blue would be helpful, as to the type of seat. Counting all of the bicycles in the set would take an extraordinary amount of time and effort. To lessen our burden, we take a random sample of a substantial size, say one hundred bicycles and count how many are blue and how many are red. After repeating the sample 19 times, we calculate that 35% of the samples are red, while 65% of the sample is blue. We also calculated that 25% of the samples have apple seats while 75% of the bicycles have banana seats. We can evaluate our accuracy and thus establish confidence on our estimates by calculating the standard deviation of the samples. We can further refine our confidence by calculating the sample error and comparing a z-test value against expected values. This will allow us to gauge whether or not the sample size was sufficient for our purposes based upon the set size.
At times, these events may have been observed as occurring simultaneously, independently or dependently. Suppose that N observations have been made, and that event A occurred na times, event B occurred nb times, and it was observed that the combination of event A and event B occurred na∩b times. Consequently, event A occurred a na∩b/nb times among the nb times event B occurred. If the experiment is conducted a large amount of times, N, then we can assume the probability of the event A occurring given the knowledge that event B is occurring can be expressed as (na∩b/N)/(nb/Na) which is equivalent to P(A∩B)/P(B). The conditional probability of event A occurring given the fact that event B is occurring is expressed as P(A∩B)/P(B)=P(A|B). Consequently P(B|A)=P(A|B)P(B)/P(A) and is read as the probability of event B occurring while event A is occurring is equal to the probability of event A occurring given the probability of event B times the ratio of the probability of event B to that of the probability of event A.
These mathematical relationships between events can be extended and generalized into mathematical relationships for more than two events. Give event A, event B and event C, the probability of all three occurring is notated as P(A∩B∩C)=P(A|B∩C)P(B|C)P(C). It is evident that any number of probabilities of any number of events can be evaluated. These mathematical relationships among variables have been formalized for us by Thomas Bayes, and are generally known as Bayes' Theorems.
As with any experiment or observation involving a degree of randomness or chaos, involving weather, games of chance or manufacturing systems, there is an underlying set of assumptions, known variations, and constraints. Given a discrete set of variations, such as the value of a card from a standard set of fifty two playing cards, or the permutations of a pair of thrown dice, or the meteorological event of rain, or the acceptability of a product from an assembly line, a deterministic background may coexist with a randomly varying counterpart. If the events of the situation are limited to a specific or discrete outcome, or a finite set of outcomes, there may be a fixed or acceptable value of expectation of each of those events, yet we are limited in our control and actual realization of those outcomes.
Bayesian Networks
Bayesian networks are an excellent choice of graphical model for decision support modeling. A Bayesian network can be described as a graphical model for probabilistic and deterministic representations of real world situations. Graphically, a Bayesian network is generally constructed using nodes (vertices) and directional arcs (edges). Mathematically, nodes are used to represent events, states of events or variables of the event. Nodes may have probabilities, conditional probability or marginal probabilities associated with the variables of the events and can be used to represent any kind of variable. Arcs may be used to indicate logical influences, logical dependencies, mathematical influences, mathematical dependence, probabilistic dependence and probabilistic influences between nodes. The lack of an arc between nodes indicates independence between events of associated nodes. Graphical models unite mathematical theory, probability theory, logical theory and graph theory seamlessly and are a natural tool for solving problems that occur throughout mathematics and engineering by addressing uncertainty and complexity in user friendly environment which may be used to represent and construct an immediately recognizable visual representation of a complex system. Bayesian networks can also be applied to design and analysis of learning algorithms, among other applications.
Within a Bayesian network, an arc emanating from a first node to a second node indicates the first node is mathematically dependent and logically dependent on the second node. This notion can be associated between multiple nodes connected via a path comprised of a series of consecutive directional arcs. For example, if Node C is dependent upon Node B, and node B is dependent on Node A, then we can assume with certainty that Node C is dependent upon Node A. Node A, Node B and Node C can be described as being conditionally dependent. The lack of an arc between two nodes is indicative of conditional independence. Examples of Bayesian networks are illustrated in FIGS. 2, 3, 4, 6, 7, 8 and 9.
A Bayesian network may be used to represent a joint probability distribution over all of the variables represented by the nodes of the graphical model. If the variables X(1), . . . , X(n) represent events 1 . . . , n and Parents(A) are the parental nodes of NodeA. We can say that the joint distribution for variables X(1) through X(n) is represented as the mathematical product of the probability distributions of P(X(i)|parents(X(i)) for i=1 to n. That is, given the probability of the parental nodes of node [X(1), . . . , X(n)|X(1)] [X(2), . . . , X(n)|X(2)]*[X(3), . . . , X(n)|X(3)]* . . . , [X(n−1)X(n)|X(n)][X(n)|X(n)]. If the variable X has no parents, or is represented by a root node, then the probability distribution of X is unconditional, otherwise the distribution of X is conditional.
By visually studying the graphical model of the Bayesian network, questions concerning the dependencies between variables can be realized by the user. The graphical notion of d-separation corresponds to the graphical notion of conditional independence. If Node A and Node B are d-separated, then variable A and variable B are independent given the evidence variables. The set of nodes which Node X is directly dependent upon consists of Node X's Markov blanket 2001, FIG. 2. The Markov blanket of a node, X is the set of nodes consisting of X's parents, X's children, and the parents of X's children.
To perform numerical calculations on the entire Bayesian network, such as performing inference, it may be necessary to specify the probability distribution for each node X with respect to the parents of that node. This approach may become numerically large and consume processor resources. Using discrete distributions, Boolean distributions, or Gaussian distributions limitations may be imposed on the model, based on the knowledge of the distribution. Algorithms are introduced to circumvent this situation by introducing the principle of maximum entropy for the specification of the distribution given known constraints. These known constraints may be coalesced from streaming data or data bases through various methods. To maximize accuracy, assumptions need to be minimized. The introduction of assumptions into the model lends to the increase of entropy, which is preferably avoidable. Typically, conditional distributions rely mathematically on known constraints and parameters which are traditionally coalesced from data, often employing a maximum likelihood algorithm, iterative approximation algorithms, or expectation-maximization algorithms.
The purpose of calculating inference is to determine the conditional distribution of a subset of variables, given the condition of known values or variables for a distinct subset of evidence values or evidence variables. This specific conditional distribution is known as the posterior distribution of the subset of the evidence variables. The posterior distribution allows a user to select values for the variable subset. The Bayesian network can be used as a mechanism for automatically constructing an extension of Bayes' Theorem to calculate distributions for increasingly complex problems. The prior art methodology is limited by inexact methods such as iterative variable elimination (either discrete or continuous) of the variables via distribution of the sum over the product, clique tree propagation which caches computations to query variables iteratively, and recursive conditioning which trades accuracy for time and processing speeds. The mathematical complexity of these methodologies grows exponentially with Bayesian network width, making them unsuitable for many applications. Other approximate inference algorithms incorporate stochastic simulation, mini-bucket elimination and variational methods, all suffer from inherent limitations of either accuracy or time.
Many situations can be modeled using the knowledge based approach. In one such situation, we can model a situation wherein we have two neighbors besides our vacation home, of which we never visit. Our neighbors, John and Mary, call when our alarm sounds. The alarm will sound in the event of a flood or a burglary. From historical records and data, we have concluded that the chances of our vacation home experiencing a flood, P(F)=0.02 and homes in that area have a probability of being burglarized, P(B)=0.01. John is our more diligent neighbor and will call 0.90 (P(J|A)=0.90) of the times the alarm sounds, while Mary will call 0.70 (P(M|A)=0.70) of the times the alarm goes sounds. Either one or both of them call when our house alarm goes sounds. Suppose that a flood occurred, and that P(A|B,F)=0.29 wherein burglary equals false and earthquake equals true. What is the probability that John is going to call? We take the product of P(A|B,F)*P(J|A)=(0.29)(0.90) to get 0.261. In the event that our alarm sounds and we receive a call, our Bayesian network can also be used to calculate the probability that it is John calling. This calculation practice is known as inference. Bayesian inference allows for a prediction to be made based on knowledge or experience.
Bayesian networks are a powerful alternative to a rules-based approach for inference. They allow both a black and white approach as well as a very gray approach to solving problems. Instead of an outcome being “yes” or “no”, it now can be “80% yes”, and “20% no.” The ability to formulate a decision such as “yes” or “no” is helpful yet that ability coupled with a confidence metric is increasingly informative and explanatory. A rendered decision based on possible actions is a better informed decision if coupled with an ability to assess the confidence of the decision variables.
Flexibility of rules-based networks coupled with expert software objects and modules allow for proactive monitoring of equipment and equipment conditions for the prediction of problems at an earlier point in time, thus allowing for alarm operations to take action. Early detection of alarm conditions allows system monitors and custodians to detect problems or potential problems before problems reach typical alarm limits of traditional process control systems. Problems detected comprise problems related to efficiency, equipment failure, environmental regulations, production yields, resource consumption, quality control, unsafe conditions and other factors.
There are several inference techniques available for Bayesian networks which often perform better than traditional art rules-based inference approaches. However, one of the main drawbacks to Bayesian networks is the large amount of information that must be entered either by hand or by some machine learning technique for every single node in the network. Often, in the design of a decision support system, a vast amount of time can be required to enter enough information to build an accurate representation of the domain, or useful inferences for that representation. This is particularly true for Bayesian network nodes whose purpose are to emulate a well-understood rule, such as defined by an “AND” or an “OR” node.
Bayesian Network Construction
The construction of Bayesian networks can be approached in many ways. Two of the most popular and conventional approaches are the “knowledge based approached” and the “data based approach.” With the knowledge based approach, a domain expert is employed to identify distinctions of the world which are important for making distinctions within our model. Distinctions made are translated into domain variables to be used within the Bayesian network. The domain is considered to be the set of all variables in the Bayesian network. At times some of the pertinent domain knowledge is unavailable, or is not specifically identifiable. Domain knowledge can be learned by modeling the system with known parameters. Dependencies among the variables are proposed, identified and verified as are probability distributions of the variables. The dependencies and probability distributions are used to quantify the strengths of the influences between variables or the strengths of the dependencies. Dependencies can be graphically illustrated as directional arcs within the graphical representation of the model system or situation. The variables and dependencies thus represented manifest themselves as a Bayesian network.
Using the data-driven approach, the expert identifies and determines the variables of the domain. Data is collected for variable that are used to drive an applicable algorithm for the generation of a Bayesian Network. The data is collected from the real world and instances of decisions made in the domain. Traditionally, the data-driven approach is used when variables are discrete.