In the past several years, there has been a rise in the number of mass-merchandise retailers that have many geographically distributed stores that typically span across cities, states, and countries. With this increase in mass retailers, there has been an increase in the use by these retailers of networked computer systems for continuously collecting transaction data related to purchases. In this regard, merchants currently face the problem of data overload and are seeking for ways to convert this storehouse of data into intelligent information about its customers that can be used for business information.
Since transaction data stems from daily purchases, returns, and exchanges that are inputted at all the cash registers of all the stores, millions of records can accumulate very quickly. There is an ever increasing need for the merchants to interpret the data and garner information regarding the customers' buying habits, and use this information to improve its own business decisions. For example, one use of this vast storehouse of data to generate rules that describe past buying trends. Once these past buying trends are determined, business plans related to inventory and promotion can be devised accordingly.
Ways to use this data to generate for business planning is also becoming increasingly important in the world of electronic commerce, where transactions are done on-line (e.g., through the Internet) and involve customers that are located worldwide.
It is desirable to have rules that summarize the buying patterns of the customers. One such type of rule that reflects customers' buying habits is referred to as cross-sale association rules. Cross-sale association rules describe the relationship of the sales of one item to the sales of another item. These cross-sale association rules are quite beneficial to both merchants and customers. For the merchants, such a cross-sale association rule can help the merchant to plan which products to offer together to better meet historical demand. For customer, it is less likely that a desired product will be out-of-stock or that a separate trip to another store is needed to purchase a related product because the current merchant does not carry that particular product.
There have been attempts to generate association rules that reflect customer behavior. Unfortunately, these systems are very limited in several respects. First, these systems narrowly define what is considered a transaction. For example, a single transaction is defined as those products that are purchased and listed in a single receipt. Only those products that are listed on the same receipt are considered to be “associated,” and only these products that are on the same receipt are tallied and counted for generation of association rules.
As can be appreciated, there are many transactions that should contribute to the association rules, but instead are not captured by these prior art systems because of this narrow definition. Consider an association rule that is designed to answer the question, how many customers who bought a TV also bought a VCR. The prior art system would count only those customers who bought both a TV and a VCR in a single transaction that used a single receipt. If the same customer went to the same store one day after he purchased a TV to purchase a VCR, this customer would not be counted since the purchases did not occur in a single transaction. Even if a customer actually purchased the TV and VCR at the same time, but utilized two different credit cards or requested two separate receipts for some reason, those purchases would not contribute to the association rule because the purchases are on two separate receipts. Accordingly, it would be desirable for a mechanism that captures more transactions that contribute to associate rules, thereby generating association rules that more accurately reflect reality.
It can be appreciated that the accuracy of such association rules depends on access to all the available information and the capture of all transactions relevant to that rule. Accordingly, it is desirable to generate these association rules based on as large a collection of transaction data that may be gathered at multiple distributed sites as possible. Unfortunately, in order to do so requires that hundreds of millions of transaction records be processed daily. Understandably, there are substantial challenges to current systems and approaches to process association rules.
One challenge is to provide continuous rather than one-time value to e-commerce. Prior art data mining efforts are focused on analyzing historical data. In reality, however, data is being continuously collected, and it is preferable to have a mechanism that mines data continuously to dynamically detect trends arid changes in real-time. For instance, prior art methods are limited to generating a cross-sale association rule that describes the relationship of the past sales of one item to the sales of another item. While such relationships are helpful in making planning and promotion decisions, the changes in cross-sale associations may be even more significant, since such changes usually reflect real-time trends, the reaction to a promotion, or the cause of sales drops or rises. Unfortunately, the prior art systems are unable to reflect such changes in cross-sale associations.
For example, suppose the sales of VCRs had been strongly associated with the sales of TVs, but this association has recently weakened as TV buyers turned to buying DVDs instead of VCRs. Such a change in the association helps explain or predict the slow down of VCR sales. As another example, the association of sales of PCs and a specific brand of printers becoming weaker may imply that customers turned to buying another brand of printers. Accordingly, it is desirable to have a mechanism to catch such dynamic association relationships.
A second challenge is how to enable a conventional system, which is configured to process small amounts of data, to process very large data sets. In a conventional shopping network, a huge volume of transaction records must be processed everyday, and it is unlikely that centralized processing will yield satisfactory results. The scalability issue becomes more critical in the provision of real-time data mining service described above. In order to scale-up, a mechanism is needed to distribute data processing, reduce data volumes at each local site by summarization, and mine data incrementally at multiple levels of aggregation. Unfortunately, the prior art does not provide a way to perform these tasks on very large data sets.
Accordingly, there remains a need for a system and method for generating association rules that more accurately reflects reality, that can provide more flexible and powerful information, and that overcomes the challenges and disadvantages set for previously.