As the amount of data expands, the ability to process and comprehend that data becomes more difficult. Patterns and trends are lost in the massive quantities of data stored in databases and data warehouses. As the influx of data increases, the ability to interpret the data also becomes more difficult. Thus, there is a need for a powerful and efficient analytical tool that can process and derive interesting knowledge from the enormous amounts of data available.
Historically, the primary method for analyzing data was to construct well structured hypotheses and test the hypotheses by analyzing data. Today, a method called data mining is one of the new ways of analyzing data. Data mining is an automated process whereby previously unknown relationships among data are discovered. The two main steps of data mining are modeling and scoring. These two steps are typically performed by a data mining tool.
Generally, modeling develops rules from analyzing sets of training data. These rules are used later to examine new cases. For example, a rule could be a set of symptoms that a doctor uses to diagnose a disease. This rule can be derived from a set of patients who had the disease. Once derived, the rule is applied to a larger group of people to assist in determining whether they have that disease. The model that is generated using this training data is then used to make predictions about future patients.
Scoring involves making predictions with the generated model. The numerical values of the scores represent the certainty of each prediction. Scores allow people to access the predictions that were found using the model. Currently, there are several methods for scoring with a model. One method of scoring involves loading data from a database into a model using an Open DataBase Connectivity (ODBC) or Structured Query Language (SQL) cursor. The scoring occurs where the model is stored and the scoring results are transmitted from the model's location to the database. Another method uses a C, C++ or Java function to represent the model. The function may be wrapped in an application, which runs against the data stored in the database. However, this option also involves massive data movement and hence, is inefficient.
Performance-wise these are not efficient options because they involve a lot of data movement from the database to the mining tool's location. Further, many models are unusable because the execution time required is too large to process the data. Thus, there is a need for a more efficient data mining system and method.