With increases in the use of computers to collect and store data and with increases in computer based transactions, such as over the Internet, there has been a proliferation of databases containing large amounts of historical data commonly referred to as “data warehouses.” For example, as more and more data is collected regarding consumer purchase and/or shopping habits, this data may be stored in a data warehouse for subsequent analysis. Other uses of data warehouses include, for example, data warehouses of genetic or other scientific data.
While the particular data may vary for different data warehouses, in general, data warehouses are databases of historical data that may utilize a “star-schema” database structure. A data warehouse is typically present to users through a multi-dimensional hypercube and provides an ad hoc query environment. Furthermore, the data warehouse will, typically, contain a large amount of data and have a complex structure.
Analytical models, such as predictive analytical models are conventionally used to analyze data in a data warehouse. Scoring of records against a predictive model, for example, may be provided by a scoring engine. Such application of a predictive model to a database record may be provided, for example, through the use of a Predictive Model Markup Language (PMML) file that defines the application of a model to data. However, invocation of these PMML files is typically platform and/or system dependent such that the necessary operations to invoke a predictive model in one platform and/or system may not function to invoke the predictive model in a different platform and/or system. Thus, models and/or PMML files may be platform and/or system specific, which may reduce the ability to provide best-practices models that may be deployed across different platforms and/or systems.
For example, conventionally, to score a record using DB2 Intelligent Miner for Scoring, a DB2 environment is typically needed to utilize the User Defined Function (UDF)/User Defined Type (UDT) information and apply that information to specific data. Thus, if a DB2 environment is unavailable to a user, scoring of a record may be impossible for that user.
Furthermore, with conventional models and scoring, there is typically no validation mechanism to check the input parameter(s) provided to the model or scoring engine. Accordingly, erroneous results could be returned that may cause undesirable actions to occur.