1. Field of the Invention
The present invention relates to a method and system of data mining and, more particularly, to a method and system of data mining which determines product dynamics in market baskets in conjunction with aggregate market basket properties.
2. Description of the Related Art
Data mining is a well known technology used to discover patterns and relationships in data. Data mining involves the application of advanced statistical analysis and modeling techniques to the data to find useful patterns and relationships. The resulting patterns and relationships are used in many applications in business to guide business actions and to make predictions helpful in planning future business actions.
One of the types of data mining is called “association analysis,” often referred to as “market basket analysis.” Association analysis reveals patterns in the form of “association rules” or “affinities.” An association rule between products A and B can be expressed symbolically as A→B which translates to the statement: “Whenever product A is in a market basket, then product B tends to be in the market basket as well.” This is an example of “product dynamics,” i.e., the effect of the purchase of one product on another product.
In the folklore of data mining, one of the most repeated stories illustrating product dynamics is that of the alleged discovery that beer and diapers frequently appear together in a shopping basket. The explanation given in this tale is that when fathers are sent out on an errand to buy diapers, they often purchase a six pack of their favorite beer as a reward. Using the association rule discussed above, this example would be expressed as “diapers→beer” or, translated, whenever diapers appear in a shopping basket, beer also tends to appear in that shopping basket.
There are a number of measures that have historically been used to characterize the importance of a particular association rule. In the context of market basket analysis, these measures are calculated in relation to all market baskets under consideration. The “confidence” of a rule “A→B” is the probability that if a basket contains A it will also contain B. The “support” of a rule is the frequency of occurrence of the rule in the set of all transactions. The “lift” of the rule is a measure of the predictive power of the premise A. Lift is a multiplier for the probability of B in the presence of A versus the probability of B without any prior knowledge of other items in the market basket.
For purposes of explanation, consider the following example: Table 1 illustrates ten typical transactions representing the market baskets for a given day at a small store. From the data in the table, it can be seen that diapers and beer appear together in some market baskets and we can conclude that when a transaction contains diapers, there is a tendency for it to also contain beer. Diapers appear in six transactions (1, 3, 4, 8, 9, and 10) and beer appears in conjunction with diapers in four of these transactions (1, 3, 9, and 10). Therefore, the rule “diapers→beer” has a confidence of 4/6=67%. Further, there are four of the ten transactions where beer and diapers appear together. This results in a value of 4/10=40% for the support of the rule. Finally, beer appears in five of the ten transactions while it appears in four of the six transactions containing diapers. This means that if a basket was randomly chosen without any prior information about any of the transactions, there is a 5/10=50% chance of finding beer. However, if we use the prior knowledge that if the basket contains diapers it has a good likelihood of also having beer, then the prospect of finding beer is improved if we choose from only baskets known to contain diapers, i.e., there is a 4/6=67% chance of finding beer. Thus, the lift of the rule “diapers→beer” is 67%/50%=1.34.
TABLE 1TRANSACTIONMARKET BASKET1Diapers, beer, chips, soap2Chips, soap3Diapers, beer, soap4Diapers, chips, soap5Soap6Chips7Beer, chips8Diapers9Diapers, beer, soap10Diapers, beer, chips, soap
Association analysis techniques discover all association rules that exceed set support and confidence thresholds. They also discover all sets of items that tend to occur in the same basket with a frequency that exceeds the support threshold; such sets are termed “frequent itemsets.”
In recognition of the importance of data mining, tools have been developed to perform the various data mining and modeling techniques. One such tool is Intelligent Miner™ sold by IBM. Intelligent Miner has an outstanding algorithm for association analysis as part of its tool suite. Being general purpose tools, Intelligent Miner and other data mining tools for association analysis reach the point of inferring frequent itemsets and rules with their corresponding metrics of interest, such as support, confidence, and lift, but go no further.
Association rules express facts deduced from data. They are true statements about the relationships observed in the data. These rules, along with their measures of confidence, support, and lift, can and should be used to generate theories or hypotheses about the effects of future actions that change the conditions under which the original observations were made. These hypotheses need to be posed in the complex and dynamic retail environment where potentially thousands of stores and tens of thousands of items must be considered against the backdrop of pricing actions, promotions, campaigns, seasonality, and product availability. Furthermore, all actions and results should be measured against a matrix of revenue and profit rather than the abstract notions of support, confidence, and lift.
Existing tools for association analysis do not factor in information about advertising and promotion and thus do not assist in developing theories or hypotheses about their effects on product sales and product dynamics in market baskets. Moreover, association analysis as commonly employed focuses on product dynamics and does not analyze the aggregate properties of individual baskets, referred to herein as “market basket dynamics.” If such analysis were conducted, it would yield data which would allow an understanding of overall buying behavior, measured at the level of market baskets, and what drives the overall buying behavior. Currently implemented association rules and frequent itemsets do not assist in determining information about the overall buying habits of the owner of a particular type of market basket; for example, what kinds of products would be found in a “high-gross margin” baskets or which products may drive such “high-gross margin” baskets.
Analysis of market basket data using data mining techniques, such as association analysis, is a recent development. Traditional methods for evaluating the effects of advertising and promotions on sales for a particular item of interest focus on aggregate financial measures. For example, traditional approaches would measure the overall value of shopping baskets that include or do not include the item of interest and compute how these measures change as a function of promotion-related actions. These methods do not consider the overall content of shopping baskets (i.e., they focus only on items of interest) and thus do not explain what these baskets tend to contain, nor do they reveal data which allows analysis of market basket dynamics. Without information about the relationships between the sale of various items and their promotion status it is not possible to explain any observed changes in the aggregate measures. Moreover, by lumping together all baskets that contain the item of interest to compute an aggregate value, these methods do not allow for the possibility of having various types of baskets all containing the item of interest but with different dynamics and thus different aggregate values.
Accordingly, a need exists for a method and system for utilizing data mining techniques to obtain and analyze data which allows individual market baskets to be characterized based on all of the items in the basket, so that, for example, an understanding of the purchasing habits of persons who possess baskets having such characteristics can be gained.