1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, systems, and products for managing data mining environments.
2. Description of Related Art
Data mining tools, such as, for example, IBM""s Intelligent Miner, are used directly in customer environments by connecting the tool to customer""s historical and production databases. This often requires many changes in data mining activities because of the direct dependency on the names used in historical and production data. Data mining models have to be trained using historical data and then applied to multiple production data sets. Scoring results obtained by applying different mining models on different data sets are saved in different tables with different names. In environments having many data sets, many mining models and many mining models applied to many data sets, the relationships among the data sets, mining models and model scoring results become complicated. In current general art, there are only ad hoc ways of tracking these many data mining data sets, using, for example, user-defined tables or even word processing documents to attempt to track identities and relations among data mining data sets. There is generally no systematic way of organizing and managing all the data sets used in and created by many trainings and many applications of many data mining models in a data mining environment, although it would be advantageous if there were.
Exemplary embodiments of the invention typically comprise managing a data mining environment where the data mining environment includes a data mining tool and a data mining model. In typical embodiments, the data mining tool trains the data mining model using an input data set to create model training results and store the model training results in a model training results data set, and scores scoring input data sets using the model training results to produce scoring output and store the scoring output in scoring output data sets. Exemplary embodiments typically include registering in a data set control table registered data sets, the registered data sets including the model training input data sets, model training results data sets, the scoring input data sets, and the scoring output data sets. Some embodiments typically include registering the data mining model in a mining model control table, the mining model control table being related to the data set control table through a mining model control table foreign key. Other embodiments typically include registering the scoring output data sets in a scoring control table, the scoring control table being related to the data set control table through a scoring control table data set foreign key, the scoring control table being related to the mining model control table through a scoring control table mining model foreign key.
In exemplary embodiments, the data set control table typically includes an identification number for each registered data set, a name for each registered data set, and a description for each registered data set. In some embodiments, the data set control table typically includes a type for each registered data set, a usage for each registered data set, and a location for each registered data set.
In exemplary embodiments, the mining model control table typically includes an identification number for the data mining model, a model name for the data mining model, and a description for the data mining model. In some embodiments, the mining control table typically includes a model type for the data mining model, an algorithm used for training the data mining model, and an identification number for a model training input data set used by the data mining tool to train the data mining model. Other embodiments typically include a date the data mining model was last trained, an identification number for a model training results data set for the data mining model, a file name for the model training results data set for the data mining model, and a file location for the model training results data set for the data mining model.
In exemplary embodiments, the scoring control table typically includes an identification number for each scoring output data set, an identification number for a model training results data set for the scoring output data set, and an identification number for a scoring input data set for the scoring output data set. Some embodiments typically include a name for the scoring output data set, a name for a scoring setting, and a scoring status indicating whether the scoring output data set is actively used.
In exemplary embodiments, the mining model control table foreign key typically includes a model training input data set identification column in the mining model control table in which content from a registered data set identification column of the data set control table is stored, and the mining model control table relates to the data set control table through the foreign key. In some embodiments, the mining model control table foreign key typically includes a model training results data set identification column in the mining model control table in which content from a registered data set identification column of the data set control table is stored, and the mining model control table relates to the data set control table through the foreign key.
In exemplary embodiments of the invention, the scoring control table data set foreign key typically includes a scoring output data set identification column in the scoring control table in which content from a registered data set identification column of the data set control table is stored, and the scoring control table relates to the data set control table through the foreign key. In some embodiments, the scoring control table data set foreign key typically includes a scoring input data set identification column in the scoring control table in which content from a registered data set identification column of the data set control table is stored, and the scoring control table relates to the data set control table through the foreign key. In other embodiments, the scoring control table mining model foreign key typically includes a model identification column in the scoring control table in which content from a model identification column of the mining model control table is stored, and the scoring control table relates to the mining model control table through the foreign key.
In exemplary embodiments of the invention, the data set control table typically includes a registered data set identification column in which an identification number for each registered data set is stored. Such embodiments also include the step of indexing the data set control table data set identification column. In some embodiments, the mining model control table typically includes a model identification column in which an identification number for each data mining model is stored. Such embodiments also include indexing the mining model control table model identification column.
In exemplary embodiments of the invention, the scoring control table typically includes a scoring output data set identification column in which an identification number for each scoring output data set is stored, a scoring input data set identification column in which an identification number for each input data set used for scoring is stored, and a model identification column in which an identification number for each data mining model is stored. Such embodiments typically include indexing the scoring control table scoring output data set identification column, indexing the scoring control table scoring input data set identification column, and indexing the scoring control table model identification column. In some embodiments, the model training results data set is typically in Predictive Model Markup Language format.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts of exemplary embodiments of the invention.