1. Field of the Invention
This invention relates to data mining systems, and in particular, to an architecture for distributed relational data mining systems.
2. Description of Related Art
Often, computer-implemented systems are used to analyze commercial and financial transaction data. In many instances, such data is analyzed to gain a better understanding of customer behavior by analysis of customer transactions.
Prior art methods for analyzing customer transactions often involve one or more of the following techniques:
1. Ad hoc querying: This methodology involves the iterative analysis of transaction data by human effort, using querying languages such as SQL.
2. On-line Analytical Processing (OLAP): This methodology involves the application of automated software front-ends that automate the querying of relational databases storing transaction data and the production of reports therefrom.
3. Statistical packages: This methodology requires the sampling of transaction data, the extraction of the data into flat file or other proprietary formats, and the application of general purpose statistical or data mining software packages to the data.
However, these prior techniques have serious shortcomings that represent significant impediments to their use and important flaws in the design of analytical architectures. Of key importance is that prior art techniques do not work well with large databases, because such schemes do not consider memory limitations and do not account for large data sets. Thus, there is a need in the art for improved techniques for implementing data mining systems, especially architectures that handle large amounts of data.
A computer-implemented data mining system includes an Interface Tier, an Analysis Tier, and a Database Tier. The Interface Tier supports interaction with users, and includes an On-Line Analytic Processing (OLAP) Client that provides a user interface for generating SQL statements that retrieve data from a database, and an Analysis Client that displays results from a data mining algorithm. The Analysis Tier performs one or more data mining algorithms, and includes an OLAP Server that schedules and prioritizes the SQL statements received from the OLAP Client, an Analytic Server that schedules and invokes the data mining algorithm to analyze the data retrieved from the database, and a Learning Engine performs a Learning step of the data mining algorithm. The Database Tier stores and manages the databases, and includes an Inference Engine that performs an Inference step of the data mining algorithm, a relational database management system (RDBMS) that performs the SQL statements against a Data Mining View to retrieve the data from the database, and a Model Results Table that stores the results of the data mining algorithm.