1. Field of the Invention
The present invention relates generally to data processing, and more particularly to "computer database mining" in which generalized association rules between significant transactions that are recorded in a database are discovered. In particular, the invention concerns mining a large database of sales transactions.
2. Description of the Related Art
Customer purchasing habits can provide invaluable marketing information for a wide variety of applications. For example, retailers can create more effective store displays and more effectively control inventory than otherwise would be possible if they know that, given a consumer's purchase of a first set of items, the same consumer can be expected, with some degree of probability, to purchase a particular second set of items along with the first set. In other words, it would be helpful from a marketing standpoint to know association rules between itemsets in a transaction. To illustrate, it would be helpful for a retailer of automotive parts and supplies to be aware of an association rule expressing the fact that 90% of the consumers who purchase automobile batteries and battery cables also purchase battery post brushes and battery post cleanser (referred to as the "consequent" in the terminology of the present invention).
It will be appreciated that advertisers too can benefit from a thorough knowledge of such consumer purchasing tendencies. Still further, catalogue companies can conduct more effective mass mailings if they know the tendencies of consumers to purchase particular sets of items with other sets of items. It is to be understood, however, that although this discussion focusses on the marketing applications of the present invention, database mining and, hence, the principles of the present invention, are useful in many other areas, e.g., business and science.
It happens that until recently, building large, detailed databases that could chronicle thousands, and from a statistical view preferably millions, of consumer transactions, much less deriving useful information from the databases (i.e., mining the databases), was highly impractical. Consequently, marketing and advertising strategies have been based upon anecdotal evidence of purchasing habits, if any at all, and thus have been susceptible to inefficiencies in consumer targeting that have been difficult if not impossible to overcome.
With the advent of modern technology, however, building large databases of consumer transactions has become possible. The ubiquitous bar-code reader can almost instantaneously read so-called basket data, i.e., when a particular item from a particular lot was purchased by a consumer, how many items the consumer purchased, and so on, for automatic electronic storage of the basket data. Further, when the purchase is made with, for example, a credit card, the identity of the purchaser can be almost instantaneously known, recorded, and stored along with the basket data. Still further, vastly improved data storage media have made it possible to electronically store vast amounts of such information for future use.
As alluded to above, however, building a transaction database is only part of the marketing challenge. Another important part is the mining of the database for useful information. Such database mining becomes increasingly problematic as the size of databases expands into the gigabyte and indeed the terabyte range.
Not surprisingly, purchasing tendencies, and, hence, particular regimes of database mining, can be classified many ways. In the above-referenced U.S. patent application Ser. No. 08/415,006, for "SYSTEM AND METHOD FOR QUICKLY MINING ASSOCIATION RULES IN A DATABASE", for example, an effective system is disclosed for quickly mining association rules that indicate purchasing habits during single transactions, i.e., rules that indicate, with user-defined degrees of confidence, which frequently-recurring itemsets are likely to purchased along with other frequently-recurring itemsets in a transaction. In accordance with the present invention, an itemset "frequently occurs" in a database and is referred to as being "large" if it appears in the database with at least a user-defined regularity, referred to herein as "minimum support".
Previous database mining systems, however, including the invention disclosed in the parent application, do not consider mining association rules across different levels of a taxonomy, but instead restricted the items in the mined rules to leaf nodes in databases. Thus, for example, in the case of an itemset taxonomy in which the item "jacket" hierarchicly depends from the item "outerwear", which hierarchicly depends from the item "clothes", the parent invention might generate an association rule that indicates that people who purchase jackets tend to purchase hiking boots at the same time, but it is unable to generate more generalized rules that, e.g., people who purchase outerwear or clothing tend to purchase hiking boots. And, because the support for an item in a taxonomy is not necessarily equal to the sum of the supports of its children, rules cannot be inferred for items at higher levels of taxonomies from rules for items at leaves.
Unfortunately, when association rules are restricted to just the leaves of a taxonomy, many significant associations might escape detection. For example, few consumers might purchase hiking boots with jackets, but many people might perhaps purchase hiking boots with outerwear in general, without previous mining systems so discovering. Moreover, a rule stating that consumers who purchase jackets tend to purchase hiking boots might be discovered by the parent invention, but it can happen that such a rule is not nearly as interesting, from a marketing standpoint, as the fact that consumers who purchase outerwear in general tend to purchase hiking boots. Consequently, by not considering taxonomies, previous systems are unable to prune out non-interesting and redundant rules. It is therefore the focus of the present invention to consider taxonomies and thereby discover generalized association rules which also satisfy; a user-defined interest criterium.
Accordingly, it is an object of the present invention to provide a system and method for mining large databases to discover generalized association rules. Another object of the present invention is to provide a system and method for discovering generalized association rules in itemsets that are stored in a transaction database, based on an item taxonomy. Still another object of the present invention is to provide a system and method for finding interesting association rules which repeat with a user-defined degree of regularity, which satisfy a user-defined degree of confidence, and which satisfy a user-defined interest criterium. Yet another object of the present invention is to provide a system and method for quickly mining large databases which is easy to use and cost-effective.