1. Field of the Invention
The present invention relates in general to the field of database analysis. In one aspect, the present invention relates to a system and method for data mining operations for identifying association rules contained in database records.
2. Description of the Related Art
The ability of modem computers to assemble, record and analyze enormous amounts of data has created a field of database analysis referred to as data mining. Data mining is used to discover association relationships in a database by identifying frequently occurring patterns in the database. These association relationships or rules may be applied to extract useful information from large databases in a variety of fields, including selective marketing, market analysis and management applications (such as target marketing, customer relation management, market basket analysis, cross selling, market segmentation), risk analysis and management applications (such as forecasting, customer retention, improved underwriting, quality control, competitive analysis), fraud detection and management applications and other applications (such as text mining (news group, email, documents), stream data mining, web mining, DNA data analysis, etc.). For example, association rules have been applied to model and emulate consumer purchasing activities. Association rules describe how often items are purchased together. For example, an association rule, “laptop  speaker (80%),” states that four out of five customers that bought a laptop computer also bought speakers.
The first step in generating association rules is to review a database of transactions to identify meaningful patterns (referred to as frequent patterns, frequent sets or frequent itemsets) in a transaction database, such as significant purchase patterns that appear as common patterns recurring among a plurality of customers. Typically, this is done by using thresholds such as support and confidence parameters, or other guides to the data mining process. These guides are used to discover frequent patterns, i.e., all sets of itemsets that have transaction support above a predetermined minimum support S and confidence C threshold. Various techniques have been proposed to assist with identifying frequent patterns in transaction databases, including using “Apriori” algorithms to generate and test candidate sets, such as described by R. Agrawal et al., “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of ACM SIGMOD Int'l Conf. on Management of Data, pp. 207-216 (1993). However, candidate set generation is costly in terms of computational resources consumed, especially when there are prolific patterns or long patterns in the database and when multiple passes through potentially large candidate sets are required. Other techniques (such as described by J. Han et al., “Mining Frequent Patterns Without Candidate Generation,” Proceedings of ACM SIGMOD Int'l Conf. on Management of Data, pp. 1-12 (2000)) attempt to overcome these limitations by using a frequent pattern tree (FPTree) data structure to mine frequent patterns without candidate set generation (a process referred to as FPGrowth). With the FPGrowth approach, frequency pattern information is stored in a compact memory structure.
Once the frequent sets are identified, the association rules are generated by constructing the power set (set of all subsets) of the identified frequent sets, and then generating rules from each of the elements of the power set. For each rule, its meaningfulness (i.e., support, confidence, lift, etc.) is calculated and examined to see if it meets the required thresholds. For example, if a frequent pattern {A, B, C} is extracted—meaning that this set occurs more frequently than the minimum support S threshold in the set of transactions—then several rules can be generated from this set:
{A}{B,C}
{B}{A,C}
{C}{A,B}
{A,B}{C}
etc.
where a rule AB which indicates a “Product A is often purchased together with Product B,” meaning that there is an association between the sales of Products A and B. Such rules can be useful for decisions concerning product pricing, product placement, promotions, store layout and many other decisions.
Conventional data mining approaches use generic item descriptions, such as the SKU (stockable unit number) when identifying items or products in a transaction database. When these generic descriptions are used to identify frequent sets, the frequent sets are not large and power-set/rule generation is tractable. However, conventional implementations of FPGrowth mining techniques using item data at the SKU (stockable unit number) level do not provide sufficient information to develop meaningful association rules for complex products. For example, if there are three transactions involving the purchase of a computer identified as “Desktop-SKU” with one of the transactions also involving the purchase of DVD disks, the product level of description used to identify the computer does not reveal that two of the computers did not include DVD drives, while the third computer (which was purchased with the DVD disks) did include a DVD drive. As this example demonstrates, this lack of granularity in the item description diminishes the quality of association rules that can be generated, resulting in limited pattern correlation.
As seen from the conventional approaches, a need exists for methods and/or apparatuses for improving the extraction of frequent patterns for use in data mining. There is also a need for finer granularity in the generation of frequent sets to better discover meaningful patterns without imposing the cost of a combinatorial explosion of the data that must be examined. In addition, there is a need for methods and/or apparatuses for efficiently generating association rules without requiring unwieldy candidate set generation, without requiring multiple database passes and without requiring additional time to generate association rules as the frequent set grows. Further limitations and disadvantages of conventional systems will become apparent to one of skill in the art after reviewing the remainder of the present application with reference to the drawings and detailed description which follow.