1. Field of the Invention
The present invention relates generally to the field of data mining, and more particularly, a novel data mining system and search methodology for generating associations among items in a large database.
2. Discussion of the Prior Art
The problem of finding association rules was introduced in a reference entitled "Mining Association Rules Between Sets of Items in Very Large Databases," Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207-216, 1993 authored by Agrawal R., Imielinski T., and Swami A. The problem identified in the reference was directed to finding the relationships between different items in a large database, e.g., a database containing customer transactions. Such information can be used for many sales purposes such as target marketing, because the buying patterns of consumers can be inferred from one another.
As described in the above-mentioned reference, there is first identified a set {I} comprising all items in the database of transactions. A transaction {T} which is a subset of {I} is defined to be a set of items which are bought together in one operation. An association rule between a set of items {X} which is a subset of {I} and another set {Y} which is also a subset of {I} is expressed as {X}=&gt;{Y}, and indicates that the presence of the items X in the transaction also indicates a strong possibility of the presence of the set of items Y. The measures used to indicate the strength of an association rule are support and confidence. The support of the rule X=&gt;Y is the fraction of the transactions containing both X and Y. The confidence of the rule X=&gt;Y is the fraction of the transactions containing X which also contain Y. In the association rule problem, it is desired to find all rules above a minimum level of support and confidence. The primary concept behind most association rule algorithms is a two phase procedure: In the first phase, all frequent itemsets (or large itemsets) are found. An itemset is "frequent" or large if it satisfies a user-defined minimum support requirement. The second phase uses these frequent itemsets in order to generate all the rules which satisfy the user specified minimum confidence.
Since its initial formulation, considerable research effort has been devoted to the association rule problem. A number of algorithms for large itemset generation have been proposed, such as those found in Agrawal R., Mannila H., Srikant R., Toivonen H., and Verkamo A. I. "Fast Discovery of Association Rules." Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Chapter 12, pages 307-328. Proceedings of the 20th International Conference on Very Large Data Bases, pages 478-499, 1994. and Brin S., Motwani R. Ullman J. D. and Tsur S., "Dynamic Itemset Counting and Implication Rules for Market Basket Data." Proceedings of the ACM SIGMOD, 1997. pages 255-264. Variations of association rules such as generalized association rules, quantitative association rules and multilevel association rules have been studied in Srikant R., and Agrawal R., "Mining Generalized Association Rules." Proceedings of the 21st International Conference on Very Large Data Bases, 1995, pages 407-419, and, Srikant R., and Agrawal R. "Mining Quantitative Association Rules in Large Relational Tables," Proceedings of the ACM SIGMOD Conference on Management of Data, 1996, pages 1-12.
Although there are many previously proposed methods and systems, there is no efficient method which can generate large itemsets for very large scale problems. For these problems, current techniques require too much time to be of any practical use. The importance of solving such large scale problems is quite great, given the fact that most databases containing customer transaction data are quite large.