1. Field of the Invention
The present invention relates generally to data processing environments and, more particularly, to a Boolean network rule engine providing improved analysis and processing of business data and rules.
2. Description of the Background Art
Business applications often analyze changes in business data in order to suggest (or determine) a course of action. During the analysis process, the business application typically evaluates a large number of logical expressions or rules. Frequently, these rules are defined by entities (e.g., entities such as people or computer programs) that are not aware of each other. Rules defined in this manner are likely to be numerous, redundant, and ad hoc in nature. Accordingly, naively evaluating these rules one by one can have prohibitive performance costs on the application performing the analysis.
Applications that analyze changes in business data frequently employ a rule engine (or inference engine) component for evaluating logical expressions or rules. A rule engine may be viewed as a sophisticated conditional statement (or “if/then” statement) interpreter. The if/then (or conditional) statements that are interpreted are called rules. The “if” portions of rules contain conditions such as “shoppingCart.total/Amount>$100”. The “then” portions of rules contain actions that occur based on evaluation of the condition, such as “recommendDiscount(5%)”. The inputs to a rule engine are a ruleset and some data objects. The outputs from a rule engine are determined by the inputs and may include the original input data objects with possible modifications, new data objects, and/or side effects such as instructing the application to send a mail message saying “Thank you for shopping.”
The expert systems branch of computer science has traditionally used the Rete algorithm (“Rete”) to impose order on large collections of rules and to evaluate them efficiently. For further description of the Rete algorithm, see e.g., Forgy, C. L., “Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem”, Artificial Intelligence, 19 (1982) pp. 17-37, the disclosure of which is hereby incorporated by reference. See also, U.S. Pat. No. 5,276,776 issued to Grady et al., titled “System and method for building a computer-based Rete pattern matching network”, the disclosure of which is hereby incorporated by reference. A salient feature of the Rete methodology is that it operates directly on the parts of a rule rather than on the rule in its entirety. Rete exploits the fact that rules often share common subexpressions. Rete evaluates these common subexpressions only once and then shares the result. A Rete network (system) remembers the values of subexpressions as it computes them, so it leverages the fact that large data sets tend to change in small increments rather than all at once. Subsequent rule evaluations involve only those subexpressions which have changed.
The Rete was originally designed to improve the speed of forward-chained rule systems by limiting the effort required to recompute the conflict set after a rule is fired. The Rete takes advantage of two empirical observations: “temporal redundancy” and “structural similarity”. Temporal redundancy occurs because the firing of a rule usually changes only a few facts, and only a few rules are affected by each of those changes. Structural similarity exists because the same pattern often appears in the left-hand side of more than one rule. Facts are variable-free tuples, patterns are tuples with some variables, and rules have as left-hand sides lists of patterns.
The Rete algorithm uses a rooted acyclic directed graph, referred to as the Rete or Rete network, where the nodes, with the exception of the root, represent patterns; and paths from the root to the leaves represent the left-hand sides of rules. At each node is stored information about the facts satisfied by the patterns of the nodes in the paths from the root up to and including this node. This information is a relation representing the possible values of the variables occurring in the patterns in the path.
The Rete methodology provides for keeping up to date the information associated with the nodes in the graph. When a fact is added or removed from working memory, a token representing that fact and operation is entered at the root of the graph and propagated to its leaves modifying, as appropriate, the information associated with the nodes. When a fact is modified, for example, the height of David is changed from 6 feet to 6 feet, 2 inches, this is expressed as a deletion of the old fact (the height of David is 6 feet) and the addition of a new fact (the height of David is 6 feet, 2 inches).
The Rete graph includes the root node, one-input “pattern nodes”, and two-input “join nodes”. The root node has as successors one-input “kind” nodes, one for each possible kind of fact (the kind of a fact is its first component). In other words, root nodes represent the entry points for objects to be tested. Root nodes then broadcast the objects to the successor nodes in the network based on their object type. When a token arrives at the root, a copy of that token is sent to each “kind” node where a “SELECT” operation is carried out that selects only the tokens of its kind. Then, for each rule and each of its patterns, a one input alpha node is created. Each “kind” node is connected to all the alpha nodes of its kind and delivers to them copies of the tokens it receives. To each alpha node is associated a relation, the “alpha memory”, whose columns are named by the variables appearing in the node's pattern. For example, if the pattern for the node is (is-a-parent-of ?x ?y) then the relation has columns named “X” and “Y”. When a token arrives at the alpha node a “project” operation extracts from the token tuples the components that match the variables of the pattern. The resulting tuple is added to the alpha memory of the node. For each rule Ri, if Ai,1 Ai,2 . . . Ai,n are in order with the alpha nodes of the rule, then two-input nodes are constructed, called “beta nodes”, Bi,2 Bi,3 . . . Bi,n, where:
Bi,2 has its left input from Ai,1 and its right input from Ai,2,
Bi,j, for j greater than 2, has its left input from Bi,j−1 and its right input from Ai,j
At each beta node Bi,j a relation is stored, the “beta memory”, which is the “join” of the relations associated to its left and right input, joined on the columns named by variables that occur in both relations. For example if the left input relations and right input relations are as follows:
TABLEXYann4sam22
TABLEXZanntomsamsuetomjane
then the resulting beta memory relation is
TABLEXYZann4tomann4sue
Finally, the last beta node of each rule is connected to a new alpha node where a “project” operation takes place to select all and only the variables that occur on the right-hand side of the rule.
The Rete methodology described above alleviates much of the inefficiency in evaluation of rules by remembering past test results across iterations of the rule loop. Only new facts are tested against any “if” conditions of rules and the same rules are not tested repeatedly. As a result, the Rete methodology is widely used for reducing the computational complexity and inefficiency that would result from the naive evaluation of rules, particularly in situations involving the evaluation of a large number of rules.
Although the Rete is widely used and is good at handling evaluation of large numbers of rules, it is less efficient when it is called upon to handle large amounts of data and/or rapidly changing data. Another limitation of the traditional Rete is that it does not provide support for use of the OR operator. As rules are often written using the OR operator, it is desirable to provide direct support for this operator.
What is needed is an improved solution that provides better performance in the evaluation of rules (i.e., improved system responsiveness), particularly in environments involving large amounts of data and/or rapidly changing data. Ideally the solution should also support the use of the OR operator. The present invention provides a solution for these and other needs.