In recent years, the field of data mining, or extracting useful information from bodies of accumulated raw data, has provided a fertile new frontier for database and software technologies. While numerous types of data may make use of data mining technology, a few particularly illuminating examples have been those of (i) mining information, useful to retail merchants, from databases of customer sales transactions, and (ii) mining information from databases of commercial passenger airline travel. In the description that follows, these examples will be used illustratively.
Customer purchasing patterns over time can provide invaluable marketing information for a wide variety of applications. For example, retailers can create more effective store displays, and can more effectively control inventory, than otherwise would be possible, if they know that, given a consumer's purchase of a first set of items, the same consumer can be expected, with some degree of probability, to purchase a particular second set of items along with the first set. In other words, it would be helpful from a marketing standpoint to know association rules between itemsets (different products) in a transaction (a customer shopping transaction).
To illustrate, it would be helpful for a retailer of automotive parts and supplies to be aware of an association rule expressing the fact that 90% of the consumers who purchase automobile batteries and battery cables also purchase battery post brushes and battery post cleanser. (In the terminology of the data mining field, the latter are referred to as the "consequent.")
It will be appreciated that advertisers, too, can benefit from a thorough knowledge of such consumer purchasing tendencies. Still further, catalogue companies can conduct more effective mass mailings if they know the tendencies of consumers to purchase particular sets of items with other sets of items.
It is to be understood, however, that although this discussion focusses on the marketing applications of the present invention, database mining and, hence, the principles of the present invention, are useful in many other areas, e.g., business and science.
It happens that until recently, it was highly impracticable to build large, detailed databases that could chronicle thousands, and from a statistical view preferably millions, of consumer transactions. Deriving useful information from the databases (i.e., mining the databases), was even more impractical.
Consequently, marketing and advertising strategies have been based upon anecdotal evidence of purchasing patterns, if any at all, and thus have been susceptible to inefficiencies in consumer targeting that have been difficult, if not impossible, to overcome.
With the advent of modern technology, however, building large databases of consumer transactions has become possible. The ubiquitous bar-code reader can almost instantaneously read so-called basket data, i.e., when a particular item from a particular lot was purchased by a consumer, how many items the consumer purchased, and so on, for automatic electronic storage of the basket data.
Further, when the purchase is made with, for example, a credit card, the identity of the purchaser can be almost instantaneously known, recorded, and stored along with the basket data.
Still further, vastly improved data storage media have made it possible to electronically store vast amounts of such information for future use.
As alluded to above, however, building a transaction database is only part of the marketing challenge. Another important part is the mining of the database for useful information. Such database mining becomes increasingly problematic as the size of databases expands into the gigabyte, and indeed the terabyte, range.
Much work, in the data mining field, as gone to the task of finding patterns of measurable levels of consistency or predictability, in the accumulated data. For instance, where the data documents retail customer purchase transactions, purchasing tendencies, and, hence, particular regimes of data mining, can be classified many ways.
One type of purchasing tendency has been called an "association rule."
In a conventional data mining system, working on a database of supermarket customer purchase records, there might be an association rule that, to a given percent certainty, a customer buying a first product (say, Brie cheese) will also buy a second product (say, Chardonnay wine). It thus may generally be stated that a conventional association rule states a condition precedent (purchase of the first product) and a condition subsequent or "consequent" (purchase of the second product), and declares that, with, say 80% certainty, if the condition precedent is satisfied, the consequent will be satisfied, also.
Methods for mining transaction databases to discover association rules have been disclosed in Agrawal et al., "Mining Association Rules Between Sets of Items in Large Databases", Proc. of the ACM SigMod Conf. on Management of Data, May 1993, pp. 207-216, and in Houtsma et al., "Set-Oriented Mining of Association Rules", IBM Research Report RJ 9567, October, 1993.
Early data mining approaches have had various drawbacks, which presented challenged for data mining pioneers to overcome. One such drawback was the requirement of excessive memory and of multiple data sorts and/or passes attributable to generating candidate itemsets of interest on-the-fly, that is during a pass over the data, which resulted in unduly prolonged processing time. Further, prior methods had not specifically addressed database structure or buffer management problems. Moreover, prior methods were incapable of discovering association rules having more than a single item in the consequent (the right-hand side of a rule), and accordingly were limited in their ability to discover useful association rules.
These and other data mining objectives have been addressed in Agrawal et al., U.S. Pat. No. 5,615,341, "System and Method for Mining Generalized Association Rules in Databases," and in co-pending, co-assigned U.S. patent applications Ser. No. 08/415,006, filed Mar. 31, 1995 now U.S. Pat. No. 5,796,209, Ser. No. 08/500,717, filed Jul. 11, 1995, Ser. No. 08/577,945 now U.S. Pat. No. 5,724,573, filed Dec. 22, 1995, and Ser. No. 08/735,911, filed Oct. 25, 1996 based on a foreign priority date of Oct. 26, 1995 now U.S. Pat. No. 5,812,997.
However, association rules have been limited in scope, in the sense that the conditions precedent and subsequent fall within the same column or field of the database. In the above example, for instance, cheese and wine both fall within the category of supermarket items purchased.
The field remains ripe for new, creative approaches to data mining, to further assist data system users to extract useful information from their accumulated data.