1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, systems, and products for managing data mining activities in a data mining environment.
2. Description of Related Art
Data mining is an analytic technique to dynamically discover patterns in historical data records and to apply properties associated with these records to production data records that exhibit similar patterns. Based on historical data, a data mining algorithm first generates a data mining model that captures the discovered patterns; this activity is called “model training.” The data mining model so generated is then applied on production data; this activity is called “model scoring.”
Data mining tools such as IBM's Intelligent Miner are used directly in customer environments by connecting the tool to customer's historical and production databases. As a result, any structural change to the historical and production databases (such as renaming tables and columns or reorganizing the columns across tables) is likely to throw the data mining activity off balance and make it disfunctional. Rectifying this often requires a lot of changes in the data mining activity because of its direct dependency on the names used in historical and production data.
To manage data mining activities in an analytic application, one needs to deal with several different types of data structures, including input data for model training and model scoring, results from model training, output data from model scoring, and metadata to manage models and model results. In current practice, definition of input data for model training and scoring is left to the end user, a tool specific proprietary internal representation is used for model training results, definition of output data from model scoring is left to the end user, and there is hardly any support for control data to manage models and model results, although it would be advantageous if there were. It would be advantageous to have a database model that includes a set of tables describing the tables containing historical data and production data.