As more and more information and data are produced and accumulated, it is very necessary to discover rules among vast information and data. The discovery and analysis of rules in data have been widely used in computer vision, speech recognition, biological computing, risk analysis, therapeutic treatment solution finding, forecasting, information retrieval, and intelligent systems, so as to carry discovery, decision optimization, and forecasting regularly. The Probabilistic Graphical Model is one of the most important and widely used models to discover dependency among multiple variables, and it is also referred to as Bayesian Network, Belief Network, or Probabilistic Independence Network.
A Bayesian Network is a Directed Acyclic Graph (DAG), in which the nodes represent the variables in the domain, and the edges represent direct conditional probabilities between variables. FIG. 1 is a schematic diagram of a Bayesian Network, where the nodes X1, X2, X3, X4 and X5 represent 5 variables, and an edge exists between variable X1 and X2, between X1 and X3, and between X2 and X4. The parent-child relationship between the nodes can be determined with the direction of the edge; for example, if the edge points from Xi to Xj, then Xi is referred to as the parent of Xj, and Xj is referred to as a child of Xi; it can be seen that X1 is the parent of X2, and X2 is a child of X1. If a path exists from node Xi to node Xj, then Xi is referred to as an ancestor of Xj, and Xj is referred to as a descendent of Xi; it can be seen that a path exists from X1 to X5, and therefore X1 is an ancestor of X5, and X5 is a descendent of X1.
Due to the fact that each variable is independent of the non-descendent set and non-parent set of the variable in the Bayesian Network, the joint probability in the Bayesian Network can be broken down to the product of conditional probabilities of all variables, i.e.:
                              P          ⁡                      (                                          X                1                            ,              …              ⁢                                                          ,              Xn                        )                          =                              ∏                          i              =              1                        n                    ⁢                      P            ⁡                          (                                                X                  i                                ❘                                  PaB                  ⁡                                      (                                          X                      i                                        )                                                              )                                                          (        1        )            where P(X1, . . . , Xn) is the joint probability in Bayesian Network B, X1, . . . , Xn are n nodes in Bayesian Network B, and PaB(Xi) is the parent set of node Xi in Bayesian Network B. For example, the joint probability in the Bayesian Network shown in FIG. 1 is:P(X1,X2,X3,X4,X5)=P(X5|X4)*P(X4|X2,X3)*P(X3|X1)*P(X2|X1)*P(X1)
A Bayesian Network that represents variable dependency can be constructed with the records of variables mainly through two steps: producing Bayesian Networks, and selecting a Bayesian Network or discovering the feature. Producing Bayesian Networks is a mechanism that produces Bayesian Networks with the records of existing variables; selecting a Bayesian Network is to select the optimal Bayesian Network or an equivalent class of the optimal Bayesian Network as the Bayesian Network to be used finally from the produced Bayesian Networks; discovering the feature is to find out the community or feature among the produced Bayesian Networks, according to certain criteria. A feature discovery is usually used in the cases in which the records of variables are not enough; in such cases, usually multiple different Bayesian Networks have higher joint probabilities and it is not unreasonable to select one of the Bayesian Networks to describe the dependency among the variables; instead, the dependency among the variables should be described with the community or feature among the Bayesian Networks that have higher joint probabilities, i.e.:
                              P          ⁡                      (                          f              ❘              D                        )                          =                                            ∑              B                        ⁢                                          P                ⁡                                  (                                      B                    ❘                    D                                    )                                            ⁢                              δ                ⁡                                  (                                      B                    f                                    )                                                                                        ∑              B                        ⁢                          P              ⁡                              (                                  B                  ❘                  D                                )                                                                        (        2        )            where P(f|D) represents the result-feature probability of all Bayesian Networks that are produced; P(B|D) represents the conditional probability of B to D, D represents the records of variables, B represents the Bayesian Networks that are produced, and f represents the feature, such as a specific edge, a specified path, or Markov Blanket feature, etc.; if f exists in B, then δ(Bf)=1; if f does not exist in B, then δ(Bf)=0.
FIG. 2 is a flow diagram of producing a Bayesian Network in the prior art. As shown in FIG. 2, the steps are as follows:
Step 201: The records of the variables are obtained.
Step 202: Supposing that each variable maps to a node in the Bayesian Network, a sequential relationship for the nodes is arbitrarily determined.
For a node in the node sequence, all nodes before the node are referred to as preceding nodes, for example, in Bayesian Network A, because node X5 points to node X2 and node X2 points to node X3, nodes X5 and X2 are preceding nodes for node X3.
Step 203: The parent set with higher probabilities is selected for each node, in accordance with the determined node sequence and the records of the variables.
The steps for selecting the parent set with higher probabilities for each node are as follows:
First, because the first node in the node sequence has no preceding node, the parent set of the first node can be set to a null set φ.
Next, the parent set can be selected for the second node and subsequent nodes in the node sequence.
The steps for selecting the parent set for each node are as follows:
A. The probabilities that the node takes its preceding nodes and null set as its first parent node are calculated, and the preceding node or null set corresponding to the highest probability is selected as the first parent node of the node.
If the first parent node is null set, it indicates the parent set of the node is a null set, and therefore it is unnecessary to perform the subsequent steps.
B. The probabilities that the node takes its preceding nodes as its second parent node are calculated, the probabilities that are higher than the probability that the node gives the first parent node are selected from the calculated probabilities, and the preceding node corresponding to the highest probability among the selected probabilities is taken as the second parent node of the node.
If all the probabilities obtained in the calculation are lower than the probability that the node gives the selected first parent node, it indicates the parent set of the node only contains one parent node, i.e., the first parent node selected in step A; in that case, it is unnecessary to perform the subsequent steps.
C. The probabilities that the node takes its preceding nodes as its third parent node are calculated, the probabilities that are higher than the probabilities that the node gives the selected first and second parent nodes are selected from the probabilities obtained in calculation, and the preceding node corresponding to the highest probability among the selected probabilities is taken as the third parent node of the node.
If all the probabilities obtained in the calculation are lower than the probabilities that the node gives the selected first and second parent nodes, it indicates the parent set of the node only contains two parent nodes, i.e. the first parent node selected in step A and the second parent node selected in step B; in that case, it is unnecessary to perform the subsequent steps.
Subsequent parent nodes for the node are selected in the same way.
For example, supposing that there are four nodes: X1, X2, X3 and X4 and the node sequence determined in step 202 is X2, X4, X1, X3, the parent set of node X1 can be selected as follows:
Step 1: The first parent node is added for node X1, i.e. preceding node X2 or X4, or a null set.
Step 2: The probability that node X1 takes node X2 as its parent node, the probability that node X1 takes node X4 as its parent node, and the probability that node X1 takes null set as its parent node are calculated, and the node with the highest probability is taken as the first parent node of node X1.
Here, node X4 is selected as the first parent node of node X1.
Step 3: The second parent node is added for node X1, i.e. the preceding node X2.
Step 4: The probabilities that node X1 takes node X4 and node X2 as its parent nodes are calculated.
Step 5: It is judged whether the probability that node X1 takes node X4 and node X2 as its parent nodes are higher than the probability that node X1 takes node X4 as its parent node; if the judging result is positive, select node X4 and node X2 as the parent nodes of node X1, i.e. the parent set of node X1 is Pa(X1)={X4, X2}; otherwise, select node X4 as the parent node of node X1, i.e. the parent set of node X1 is Pa(X1)={X4}.
Step 204: A Bayesian Network is constructed in accordance with the parent sets selected for the variables.
Step 205: It is judged whether the criterion for stopping the loop is met now; if the criteria are met, Step 207 is performed; otherwise, Step 206 is performed.
The criterion for stopping the loop can be: the Bayesian Network has been established for a duration longer than the predefined duration, or the joint probability of the current Bayesian Network is equal to the predefined joint probability, or the result-feature probability of the current Bayesian Network is lower than the predefined result-feature probability, etc.
Step 206: The weights of the edges in all of the retained Bayesian Networks are determined, and thereby the new node sequence is determined, and then the process returns to Step 203.
Step 207: The conditional probability distribution of the node in the retained Bayesian Networks is determined, i.e. the probabilities that the node gives the current preceding nodes of the node are calculated.
However, the method described above has a drawback, i.e. it is very difficult to produce the optimal Bayesian Network that is based on the node sequence, because: if the produced Bayesian Network contains wrong edge information, the resulting new node sequence tends to be misled by the wrong edge information, and thereby it will take much longer time to find the globally optimal Bayesian Network.