1. Field of the Invention
The present invention relates generally to the analysis ("mining`) of large computer databases and the discovery of generalized association rules between significant transactions recorded in such a database. More particularly, the invention concerns integrating constraints into a database rule discovery method during its execution rather than at a post-execution stage.
2. Description of the Related Art
Customer purchasing habits can provide invaluable marketing information for a wide variety of applications. If a retailer knows with some certainty that a consumer who purchases a first set of items, or "itemset", can be expected to purchase a particular second itemset along with the first itemset, the retailer can create more effective store displays and inventory controls. Simply, it would be helpful from a marketing standpoint to know the purchasing habits of the store's customers.
Advertisers may also benefit from a thorough knowledge of consumer purchasing tendencies. Typical business decisions such as what item to put on sale, how to design coupons, and how to place items on shelves to maximize profit are just a few of the benefits that can be realized. Further, catalogue companies can conduct more effective mass mailings if they know the tendencies of consumers to purchase particular sets of items with other sets of items. Database mining and association principles are useful in many other areas throughout business and science.
In the past, building large detailed databases that could chronicle thousands or millions of consumer transactions, much less deriving useful information from the databases (i.e., mining the databases), was highly impractical. Consequently, marketing and advertising strategies have been based upon anecdotal evidence of purchasing habits and thus have been susceptible to inefficiencies and inaccuracies in consumer targeting, that have been difficult if not impossible to overcome.
With the advent of modern technology, building large databases of consumer transactions is possible. By using a bar-code reader, a retailer can almost instantaneously read "basket data." Basket data tells the retailer, among other things, when a particular item from a particular lot was purchased by a consumer, or how many items the consumer purchased. This data is automatically stored in electronic storage. Further, when the purchase is made with a debit or credit card, the identity of the purchaser can be immediately determined, recorded, and stored along with the basket data. Still further, vastly improved data storage media have made it possible to electronically store huge amounts of such information for future use.
However, building a transaction database alone is of little use without a fast and efficient way of analyzing the database for useful information. Such database analysis becomes increasingly problematic as the size of a database expands into the gigabyte, or terabyte ranges, or beyond.
Traditionally, database analysis regimes have been classified many ways. For example, effective systems are known for quickly discovering association rules that indicate purchasing habits during single transactions. In such cases, the association rules may indicate, with user-defined degrees of confidence, which frequently-recurring itemsets are likely to purchased along with other frequently-recurring itemsets in a transaction. An itemset "frequently occurs" in a database and is referred to as being "large" if it appears in the database with at least a user-defined regularity, referred to herein as "minimum support".
Previous database discovery systems, however, do not consider discovering association rules across different levels of a taxonomy. Instead, the systems restrict the items in the discovered rules to leaf nodes in databases. Thus, for example, in the case of an itemset taxonomy in which the item "jacket" hierarchicly depends from the item "outerwear", which in turn hierarchicly depends from the item "clothes", the prior inventions might generate an association rule that indicates that people who purchase jackets tend to purchase hiking boots at the same time. However, prior inventions are unable to generate more generalized rules, e.g., that people who purchase outerwear or clothing tend to purchase hiking boots.
Unfortunately, when association rules are restricted to just the leaves of a taxonomy, many significant associations might escape detection. For example, few consumers might purchase hiking boots with jackets, but many people might purchase hiking boots with outerwear in general, without previous discovery systems so discovering. Consequently, by not considering associations across levels of taxonomies, previous systems are unable to prune out non-interesting and redundant rules.
In summary, the problem of discovering association rules has led to the development of the various methods discussed above. However, in practice, users are usually interested in a subset of association rules. For example, they may only want rules that contain subsets ("children") of a specific item ("parent") in a given hierarchy. While constraints for generating desired rules might be applied as a post-processing step, this late consideration of the constraints increase the time a method would spend reaching a desired subset of association rules. A new method is needed that allows user-specified constraints to be integrated into a rule discovery process itself so that the associated processing time is reduced.