A data mining model typically includes rules and patterns derived from historical data that has been collected, synthesized and formatted. In some cases, these models are used in combination with an analytical software application to generate predictive outputs. In order for applications or software application designers to apply or change data mining models, they need to understand how the models are organized and what information the models contain. As such, the data mining models typically include annotated textual descriptions of model semantics. These annotations can be related to an entire model and also to elements of the model, such as input or output data fields. Often, however, the textual descriptions are hard-coded in a single written language (e.g., English or German), thereby limiting the scope or applicability of such descriptions.
Models that are defined using the Predictive Model Markup Language, or PMML, also often contain descriptions that are hard-coded in a single language. One goal of PMML is to support the exchange of data mining models between different data mining servers, applications or visualization tools. PMML is a markup language that uses the Extensible Markup Language (XML) as its meta-language. The PMML standard does not directly address the support of descriptions in different written languages. For example, the descriptions for models and model fields using PMML are typically considered as flat textual strings that do not address the existence of the descriptions in multiple languages. In such instances, a front-end software application that requests predictive data using a model in PMML format may only receive descriptions of the models and model fields in a single language (e.g., only English or only German).
The exchange of data mining models—and the predictive output from those models—between different applications is not limited to users that are fluent in a single language. Rather, administrators of the data warehouses where mining models are created and administrators of the various front-end software applications communicate in various languages, such as English, German, and French. In addition, end-users of the front-end applications may also been faced with the textual descriptions contained in mining models. For example, call-center agents in different countries may have to analyze a textual representation of the rules contained in the mining model, to “understand” the rational of a system decision, and these textual descriptions are typically based on the descriptions contained in the mining models. Thus, an English-speaking administrator or front-end software user may be deterred from using or applying certain data mining models if the description of those models and the data fields are provided only in German text.
For example, an English-speaking agent of a front-end call center application may want to request a prediction (e.g., likelihood that a specific customer will complete a purchase) using a predictive model. This call-center agent may desire to see some metadata from the predictive model, such as a description of the model, the description of the data fields, and even the textual description of the rules which led to the prediction. This information is potentially important if the predictive output returned to the front-end application was processed while missing some value for an input data field, thus causing the predictive output to be less than optimal. In this case, the agent may have to be warned that some information was missing, that the prediction result could be of less quality, and that the agent should try to get the missing information from the customer. All the texts exposed to the call-center agent or to an administrator deploying mining models and attaching them to a front-end application, are based on the descriptions contained in the mining model. If the predictive model that generated the output for the front-end application provided descriptions of the model and the data fields in German text only, the English-speaking front-end application user may not understand those textual descriptions and find the predictive output to be less than helpful. Similarly, an English-speaking administrator who has to deploy the model and who has to make it applicable to the front-end application may not understand the information contained in the model.