1. Field of the Invention
The present invention relates in general to the field of database analysis. In one aspect, the present invention relates to a system and method for database pattern mining operations for generating and evaluating association rules contained in database records.
2. Description of the Related Art
The ability of modern computers to assemble, record and analyze enormous amounts of data has created a field of database analysis referred to as data mining. Data mining is used to discover association relationships in a database by identifying frequently occurring patterns in the database. These association relationships or rules may be applied to extract useful information from large databases in a variety of fields, including selective marketing, market analysis and management applications (such as target marketing, customer relation management, market basket analysis, cross selling, market segmentation), risk analysis and management applications (such as forecasting, customer retention, improved underwriting, quality control, competitive analysis), fraud detection and management applications and other applications (such as text mining (news group, email, documents), stream data mining, web mining, DNA data analysis, etc.). For example, association rules have been applied to model and emulate consumer purchasing activities. Association rules describe how often items are purchased together. For example, an association rule, “laptopspeaker (80%),” states that four out of five customers that bought a laptop computer also bought speakers.
The first step in generating association rules is to review a database of transactions to identify meaningful patterns (referred to as frequent patterns, frequent sets or frequent itemsets) in a transaction database, such as significant purchase patterns that appear as common patterns recurring among a plurality of customers. Typically, this is done by using thresholds such as support and confidence parameters, or other guides to the data mining process. These guides are used to discover frequent patterns, i.e., all sets of itemsets that have transaction support above a pre-determined minimum support S and confidence C threshold. Various techniques have been proposed to assist with identifying frequent patterns in transaction databases, including using “Apriori” algorithms to generate and test candidate sets, such as described by R. Agrawal et al., “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of ACM SIGMOD Int'l Conf. on Management of Data, pp. 207-216 (1993). However, candidate set generation is costly in terms of computational resources consumed, especially when there are prolific patterns or long patterns in the database and when multiple passes through potentially large candidate sets are required. Other techniques (such as described by J. Han et al., “Mining Frequent Patterns Without Candidate Generation,” Proceedings of ACM SIGMOD Int'l Conf. on Management of Data, pp. 1-12 (2000)) attempt to overcome these limitations by using a frequent pattern tree (FPTree) data structure to mine frequent patterns without candidate set generation (a process referred to as FPGrowth). With the FPGrowth approach, frequency pattern information is stored in a compact memory structure.
Once the frequent sets are identified, the association rules are generated by constructing the power set (set of all subsets) of the identified frequent sets, and then generating rules from each of the elements of the power set. For each rule, its meaningfulness (i.e., support, confidence, lift, etc.) is calculated and examined to see if it meets the required thresholds. For example, if a frequent pattern {A, B, C} is extracted—meaning that this set occurs more frequently than the minimum support S threshold in the set of transactions—then several rules can be generated from this set:                {A}{B, C}        {B}{A, C}        {C}{A, B}        {A, B}{C}etc.where a rule AB which indicates that “Product A is often purchased together with Product B,” meaning that there is an association between the sales of Products A and B. Such rules can be useful for decisions concerning product pricing, product placement, promotions, store layout and many other decisions.        
Conventional approaches for generating frequent patterns (e.g., with a standard market basket analysis techniques) look at the frequency of item patterns in orders, but do not attempt to determine if patterns are becoming more or less frequent over time. Using shorter and more recent time periods for determining pattern frequency generally increases the weighting of recent pattern frequency, but typically lowers the amount of statistical significance to the data. Conversely, using longer time periods for determining pattern frequency yields more statistical confidence in the data, but decreases the accuracy due to the inclusion of older pattern frequency data. Accordingly, a need exists for methods and/or apparatuses for improving the generation and analysis of frequent patterns for use in data mining. There is also a need for improving pattern mining processes to better predict future demand. In addition, there is a need for methods and/or apparatuses for efficiently generating future expected pattern frequency information. Further limitations and disadvantages of conventional systems will become apparent to one of skill in the art after reviewing the remainder of the present application with reference to the drawings and detailed description which follow.