Data mining is a technique by which hidden patterns may be found in a group of data. True data mining doesn't just change the presentation of data, but actually discovers previously unknown relationships among the data. Data mining is typically implemented as software in or in association with database systems. Data mining includes several major steps. First, data mining models are generated based on one or more data analysis algorithms. Initially, the models are “untrained”, but are “trained” by processing training data and generating information that defines the model. The generated information is then deployed for use in data mining, for example, by providing predictions of future behavior based on specific past behavior.
The use of association rules is an important technique that is useful for data mining. Association rules capture co-occurrence of items or events in large volumes of data, such as customer transaction data. The widespread adoption of bar-code technology has made it possible for retail organizations to collect and store massive amounts of sales data. Likewise, the more recent growth of online sales also generates large amounts of sales data. Collectively, such sales data is termed “basket” data. Originally, association rules were defined in the context of basket data. For example, an association rule based on basket data might be defined as: 90% of customers who buy both snow boots and jackets also buy ski equipment. Finding such rules is valuable for cross marketing and mail order promotions. Other applications may include catalog design, add-on sales, store layout, customer segmentation, web page personalization, and target marketing.
Problems arise when attempts are made to utilize current data mining systems to perform enterprise data mining. Current systems that perform association rule analysis tend to provide inadequate performance for large datasets, and in particular, do not provide scalable performance. This leads to it taking hours or even days to build a single model. In the context of enterprise data mining, a wide variety of models must be generated to meet specific, but widely different needs throughout the enterprise. A typical enterprise has a variety of different databases from which data is drawn in order to build the models. Current systems do not provide adequate integration with the various databases throughout the enterprise. Likewise, current systems provide limited flexibility in terms of specifying and adjusting the model being built to meet specific needs. Likewise, the various models that are built must be arranged so as to operate properly on the particular system within the enterprise for which the models were built. Current systems provide limited model arrangement and export capability.
A need arises for a technique by which association rule analysis may be performed that provides improved performance in model building, good integration with the various databases throughout the enterprise, flexible specification and adjustment of the models being built, flexible model arrangement and export capability, and expandability to additional types of datasets.