Datasets are used in a wide variety of practical applications. Object models can be used to describe how datasets are organized. An exemplary object model in the e-business domain would include a group of classes, which may correspond to a group of tables containing data in a database. Each class contains a different set of data pertaining to, for example, customer personal information, invoices, or product inventory. These classes are related by common fields shared by the classes, such as a customer number used in both the customer class and the invoice class, or an item number used in both the invoice class and the product inventory class.
In this example, a customer class would contain fields, corresponding to columns in a table, such as customer name, customer number, address and telephone number. The set of the aforementioned fields and the field types contained within each customer object would define the customer class. The individual objects correspond to rows of data in the table, generally with values for each field associated with the object. For instance, there would be an object for customer Allen, and Allen's personal information would be stored as values corresponding to each of the name, customer number, address, and telephone number fields.
Rules are used to organize objects. Rules act upon tuples, which may comprise an object or a combination of objects. In many practical applications, rules are used to segregate groups of data within datasets for use by the owner of the dataset. For example, a retail business owner may want a set of customers to send marketing materials or to offer sales or promotions. To do so, the retailer would use a ruleset to identify a subset of customers within the entire dataset.
Rules are composed of conditions, which include one or more tests, and one or more actions that are associated with the conditions. If the conditions of the rule are met, then the action is performed.
There are several types of conditions commonly used in rules. These include “simple” conditions, the “exists” condition, the “not” condition, and the “collect” condition. In order to process a given object, a “simple” condition first determines if the object belongs to the desired class, and then determines if the object satisfies the tests within the condition. The “exists” condition and the “not” condition operate upon a set of objects instead of a single object, and determine whether or not the desired object exists or does not exist within the set. The “collect” condition functions by first constructing a set of objects from the dataset that satisfies an inner “simple” condition, and then performs subsequent testing on the assembled set of objects.
Examples of rules include “if the age of a person is not between 0 and 120, return an error” or “if the price value of a shopping cart exceeds $200, reduce the price value by 10%.” In the preceding examples, the “if” clause is the condition, which must be met for the action to take place. The condition here is a “simple” condition because it acts upon a single object, rather than a set of objects. Likewise, the “return” and the “reduce” clauses are the actions, which are executed when the conditions are true. Software platforms are currently used to build systems to work with sets of rules. One example of such a software platform is sold by ILOG, Inc. of Mountain View, Calif. under the title JRules.
To work with rules, a software platform generally utilizes a rule language and a rule engine. A rule language is a computer language which allows description of the rules using formal syntax. A rule engine is the software that actually executes the rules in the order determined by a rule execution algorithm.
A rule execution algorithm generally converts the given rules written in the syntax prescribed by the rule language into an internal representation form, reorganizes the rules through analysis and optimization, and then executes them. Two examples of rule execution algorithms are known as the RETE algorithm and the Sequential algorithm.
It is believed that the RETE algorithm was originally described in Charles Forgy, Rete; A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem, Artificial Intelligence 19, 1982 at 17-37. Modifications of the original RETE algorithm are known in the art. The RETE algorithm generally works by first building a rule execution graph from the rules, called the RETE network. In the RETE network, each object is classified through a series of discrimination tests, which organizes the objects based on one or more fields, such as age, and stores the results into a “node,” or location in the RETE network. The discrimination tests in the RETE network are inferred from the conditions of the ruleset. The network is arranged such that the application of one rule may cause other rules to become executable, a technique known as “rule chaining.” When a rule requires testing against multiple objects, instead of a single object, a join-test is utilized. Join tests are more complex than discrimination tests, but as indicated above are only used when a rule requires testing against multiple objects. When the rule conditions are met, then the corresponding prescribed action or actions are executed.
The RETE algorithm is generally well suited for production rule sets and computation rule sets, because it is stateful and maintains incremental behavior, meaning objects can be added, changed, or removed from the data set while the algorithm is running.
A disadvantage of the RETE algorithm, however, is that it generally does not provide any efficiency advantages when the ruleset consists of simple compliance/validation rules.
Compliance/validation rules simply determine whether or not an object has certain characteristics or properties. Based upon the outcome of the condition, a value is associated with the object. For example, a set of compliance/validation rules for a retailer could say: if the customer has spent a total of $1000 or more, his membership level value is “gold;” if the customer has spent between $500 and $1000, his membership level value is “silver;” and if the customer has spent less than $500, his membership level value is “bronze.” In this example, the customer objects may or may not have membership level values stored before the rule is executed. If there are pre-stored values when the rule is executed, then the membership level value is overwritten for each customer object. A compliance/validation rule example for the e-business context is “if the description of an item contains the terms ‘sex’ or ‘gun,’ then the item is marked invalid.” This rule could be used to screen classified advertisement or online auction listings before accepting the listing, for example. Because compliance/validation rules generally operate upon a single given object, the RETE algorithm does not generally provide increased efficiency.
Another disadvantage of the RETE algorithm is that it lacks scalability, because it can only, in practice, deal with several thousand rules. The RETE algorithm therefore cannot be used at all in some compliance/validation applications, which may involve hundreds of thousands of rules.
Another disadvantage of the RETE algorithm is that it cannot deeply optimize the ruleset because it is required to maintain consistency when objects are added, deleted, or changed. This in turn entails repeated execution of the ruleset when these changes are made, resulting generally in further inefficiency.
A second example of a known rule execution algorithm is a “sequential” algorithm. In the sequential algorithm, generally each of the given rules is converted to an “if-then” function. Each object is then evaluated one by one against every rule. If, for the first rule, the condition is met, then the corresponding actions are executed. Regardless of whether or not the actions for the first rule are executed, the algorithm tests the object against the next rule. The process is repeated for every rule in sequence until all of the rules have been evaluated. The next object is then loaded, and the rules are executed again until every object has been processed. The number of tests performed is therefore the product of the number of rules and the number of objects.
The sequential algorithm generally can handle hundreds of thousands of rules in compliance/validation applications because it does not build a RETE network, instead building loops and tests corresponding to each rule. The sequential algorithm also is generally transparent, meaning it conceals the execution of the algorithm from the user, and in doing so shields the user from the algorithm and engineering complexity. This engineering complexity includes dividing the generated code into segments that comply with the size restrictions of the programming language, which would be difficult if the code was manually drafted. Furthermore, the sequential algorithm can generally be configured for dynamic selection of the rules as active or inactive for each object, resulting in higher accuracy of rule application and therefore, greater efficiency.
The sequential algorithm has several drawbacks. First, the algorithm's runtime can be significantly longer than other rule execution algorithms because every rule is applied to every object unless some dynamic selection is utilized. The sequential algorithm also is generally less efficient than other rule execution algorithms because it must evaluate all of the tests individually, because sharing is not used to reduce the number of tests performed. The sequential algorithm is thus best suited for compliance/validation applications.
Another drawback of the sequential algorithm is that it is restricted by the size limitations of the programming language it utilizes. For example, in the Java programming language, created by Sun Microsystems Inc., of Santa Clara, Calif., code is stored in .class files, which are restricted in size. If the code for the ruleset exceeds this size, the sequential algorithm must decompose the code into multiple class files and chain them together, a level of complexity that generally causes a loss of efficiency.
A further drawback of the sequential algorithm is that it does not support conditions that operate on a set of objects. As noted above, examples of these conditions include the “exists,” “not,” and “collect” conditions. In order to process these types of conditions, the sequential algorithm would need to be extended, which would increase the algorithm's complexity and reduce efficiency.
The sequential algorithm does not manage loops on the dataset, and generally requires a tuple to be passed. The tuples are generally generated by loops that are outside of the sequential algorithm. The tuples generally reflect every possible combination of objects in the dataset. Because the loops exist outside of the algorithm, there is no way to share the loops for the rules that operate on the same subset of objects.
In view of the foregoing, it would be desirable to have a rule execution algorithm that does not evaluate tests unnecessarily.
It would also be desirable to have a rule execution algorithm that is scalable to handle large quantities of rules without having to chain files together.
It further would be desirable to have a rule execution algorithm that is portable to different software languages.
It would also be desirable to have a rule execution algorithm with a shorter runtime when processing data than currently available rule execution algorithms.
It further would be desirable to have a rule execution algorithm that can directly operate upon a rule engine's working memory without use of a tuple generator.