1. Field of the Invention
The present invention generally relates to a method system and program product for developing a data model in a data mining system. Specifically, the present invention allows a data model to be developed using predefined data transformations stored in a database table.
2. Related Art
In business, the manipulation of data and statistics can be an important tool in achieving optimal levels of efficiency and sales growth. Today, many companies have data modeling groups whose function is to produce mathematical models to be deployed in operational systems. For example, a company may wish to predict the propensity of a current customer to purchase another product offered by the company (i.e., a cross-sale). In such a case, the company's data modeling personnel would attempt to devise a data model that could accurately predict this propensity for existing customers. The current process for modeling data is to: (1) take fairly raw data from the operational systems and/or a data warehouse; (2) apply mathematical transformations and aggregations to the data; and (3) then develop a data model in an iterative fashion. During the process, refinements are made to establish the best transformations to provide the attributes that give the best predictive ability of the resultant model.
Once the data model has been developed, it is usually passed from the data modeler to Information Technology (IT) personnel within the company for application in an operational context. Unfortunately, for a data model to operate in the operational context, each transformation the modeler has undertaken must be repeated identically with the actual operational data. Thus, unless the data model is given the same stimuli in the operational context as during the development process, the data model will likely fail in operation. Moreover, under the current methodology, the data transformations applied to the data during the model development process are written during the development process. That is, when a modeler is attempting to develop a data model, he/she will also write the necessary data transformations. Not only does this lead to pervasive duplication of efforts among data modelers, but it can also lead to differences in data transformations that have the same purpose. Still yet, the current process requires numerous exchanges to occur between the data modelers and the IT personnel for implementation of the data model in the operational context. Often, such exchanges consume weeks or even months. During this time, the company is potentially exposed to lost opportunities and profits. Existing systems that fail to address these problems include U.S. Pat. No. 6,014,670 to Zamanian et al., U.S. Pat. No. 6,339,775 to Zamanian et al. and U.S. Patent Application Publication No. 2002/038450 A1 to Kloppmann et al., all of which are herein incorporated by reference.
In view of the foregoing, there exists a need for a method, system and program product for developing a data model in a data mining system. Specifically, a need exists for a database table that includes predefined data transformations. A further need exists for each predefined data transformation to be associated with a unique identifier, a corresponding description and a validity period. Another need exists for data models to be developed using the predefined data transformations.