1. Field of the Invention
The present invention relates generally to systems and methods for generating association rules for describing relationships among items in a database, and particularly, to a system and method implementing skew of data included in a database of sales transactions for determining personalized association rules.
2. Discussion of the Prior Art
Association rules are generated to find the relationships between different items in a database of transactions, e.g., a sales transaction. A sales transaction is a set of items purchased by a given consumer at one time. Such rules track the buying patterns of consumers, e.g., finding how the presence of one item in a transaction affects the presence of another and so forth. The problem of association rule generation has recently gained considerable prominence in the data mining community because of its potential as an important tool for knowledge discovery.
Given I={i1, i2, . . . , im} as a set of binary literals called items, each transaction T is a set of items, such that T is a subset of I. This corresponds to the set of items which a consumer may buy in a basket transaction. An association rule is a condition of the form X== greater than Y where X and Y are two sets of items. The idea of an association rule is to develop a systematic method by which a user may infer the presence of some sets of items, given the presence of other items in a transaction. Such information is useful in making decisions such as customer targeting, shelving, and sales promotions.
An important approach to the association rule problem was developed by Agrawal, et al., such as described in the reference by Agrawal R., Imielinski T., and Swami A., entitled xe2x80x9cMining Association Rules Between Sets of Items in Very Large Databases,xe2x80x9d Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207,216, 1993 (Agrawal et al.). As described, the term SUPPORT of a rule X== greater than Y is defined as the fraction of transactions which contain both X and Y. The CONFIDENCE of a rule X== greater than Y is the fraction of transactions containing X, which also contain Y. Thus, if a rule has 90% confidence, then it means that 90% of the tuples containing X also contain Y. The approach taken by Agrawal et al. is a two-phase large itemset approach implemented as follows: 1) the first step is to generate all combinations of items that have fractional transaction support above a certain user-defined threshold called MINSUPPORT; these combinations are herein referred to as LARGE ITEMSETS. Given an itemset X={i1, i2, . . . , ik}, it may be used to generate at most k rules of the type [Sxe2x88x92{ ir}]== greater than ir for each r in {1, . . . k}. Once these rules have been generated, only those rules above a certain user defined threshold called MINCONFIDENCE may be retained. The most computationally intensive part of the association rule problem is that of finding large itemsets. The second step of actually generating the rules is relatively straightforward.
Initially, the method was proposed only for the case of transaction data however, further research has been devoted to speeding up the algorithm and extending the approach to other scenarios such as described in the following references: Agrawal et al. R., Imielinski T., and Swami A., xe2x80x9cMining Association Rules Between Sets of Items in Very Large Databases,xe2x80x9dProceedings of the ACM SIGMOD Conference on Management of Data, pages 207,216, 1993; Agrawal R., Mannila H., Srikant, R., Toivonen H., and Verkamo A. I., xe2x80x9cFast Discovery of Association Rulesxe2x80x9d, Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Chapter 12, pages 307-328, and, Proceedings of the 20th International Conference on Very Large Data Bases, pages 487-499, 1994; Brin S., Motwani R., Ullman J. D., and Tsur S., xe2x80x9cDynamic Itemset Counting and implication rules for Market Basket Dataxe2x80x9d, Proceedings of the ACM SIGMOD, 1997, pages 255-264; Han J. And Fu Y., xe2x80x9cDiscovery of Multi-level Association Rules From Large Databasesxe2x80x9d, Proceedings of the International Conference on Very Large Databases, pages 420-431, Zurich, Switzerland, September 1995; Lent B., Swami A., and Widom J., xe2x80x9cClustering Association Rulesxe2x80x9d, Proceedings of the Thirteenth International Conference on Data Engineering, pages 220-231, Birmingham, U.K., April 1997; Mannila H., Toivonen H., and Verkamo A. I., xe2x80x9cEfficient Algorithms for Discovering Association Rulesxe2x80x9d, AAAI Workshop on Knowledge Discovery in Databases, 1994, pages 181-192; Park J. S., Chen M. S., and Yu, P. S., xe2x80x9cAn Effective Hash-based Algorithm for Mining Association Rulesxe2x80x9d, Proceedings of the ACM SIGMOD Conference on Management of Data, 1995; Savasere A., Omiecinski E., and Navathe S. B., xe2x80x9cAn Efficient Algorithm for Mining Association Rules in Large Databasesxe2x80x9d, Proceedings of the 21st International Conference on Very Large Databases, 1995; Srikant R., and Agrawal R., xe2x80x9cMining Generalized Associate Rulesxe2x80x9d, Proceedings of the 21st International Conference on Very Large Data Bases, 1995, pages 407-419; Srikant R., and Agrawal R., xe2x80x9cMining Quantitative Association Rules in Large Relational Tablesxe2x80x9d, Proceedings of the ACM SIGMOD Conference on Management of Data, 1996, pages 1-12; and, Toivonen H., xe2x80x9cSampling Large Databases for Association Rulesxe2x80x9d, Proceedings of the 22nd International Conference on Very Large Databases, Bombay, India, September 1996.
Another area of research to which this invention is related is referred to as clustering. The problem of clustering is that of segmenting the data into groups of similar objects. The problem of finding clusters in high dimensional data has been discussed in the following references: R. Agrawal, J. Gehrke, D. Gunopolos and P. Raghavan, xe2x80x9cAutomatic Subspace Clustering of High Dimensional Data for Data Mining Applicationsxe2x80x9d, Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, Wash., 1998; M. Ester, H. -P. Kriegel and X. Xu, xe2x80x9cA Database Interface for Clustering in Large Spatial Databases,xe2x80x9d Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995; M. Ester, H. -P. Kriegel, J. Sander and X. Xu, xe2x80x9cA Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noisexe2x80x9d, Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, Portland, Ore., August 1996; R. Kohavi and D. Sommerfield, xe2x80x9cFeature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topologyxe2x80x9d, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995; S. Guha, R. Rastogi and K. Shim, xe2x80x9cCURE: An Efficient Clustering Algorithm for Large Databasesxe2x80x9d, Proceedings of the 1998 ACM SIGMOD Conference, pages 73-84, 1998; R. Ng and J. Han, xe2x80x9cEfficient and Effective Clustering Methods for Spatial Data Miningxe2x80x9d, Proceedings. of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 1994, pages 144-155; and, T. Zhang, R. Ramakrishnan and M. Livny, xe2x80x9cBIRCH: An Efficient Data Clustering Method for Very Large Databasesxe2x80x9d, Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996.
The clustering and data segmentation techniques described in the prior art have heretofore never been applied for the purpose of generating personal association rules for customers.
Thus, it would be highly desirable to provide a system and method for finding personalized association rules by segmenting the data into groups of similar records, and using this segmentation in order to find the personalized rules. The motivation in finding personalized association rules is that e-commerce merchants are able to track buying behavior of customers using the online sales transaction data. This data may be used to determine association rules which are specific to each individual customer and thus, may be used as a tool for performing target marketing for that customer.
The present invention is directed to a technique for finding personalized association rules that exploits the skew in data, i.e., the local characteristics of the data, in order to generate itemsets used to create personalized association rules.
According to the invention, there is provided a system and method for developing association rules which are personalized for each customer by partitioning (clustering) a set of records into discrete segments. The key motivation of this method is that different parts of the data may show different kinds of trends and, the clustering is used in order to create a segmentation of the data such that these trends are captured in each segment. Thus, a different set of association rules are relevant for each segment. That is, for a given user, the segment to which he/she belongs most closely may be readily determined, and the trends in that segment may be used for generating the association rules. The process of finding the segmentation and itemsets are interweaved into a single algorithm.
The present invention is useful in target marketing, as associations may be found in each segment of the data.