Data mining is a technique by which hidden patterns may be found in a group of data. True data mining doesn't just change the presentation of data, but actually discovers previously unknown relationships among the data. Data mining is typically implemented as software in or in association with database systems. Data mining includes several major steps. First, data mining models are generated by based on one or more data analysis algorithms. Initially, the models are “untrained”, but are “trained” by processing training data and generating information that defines the model. The generated information is then deployed for use in data mining, for example, by providing predictions of future behavior based on specific past behavior.
The form that the information that defines each model takes depends upon the particular data analysis algorithm upon which the model is based. For example, a model based upon a classification and regression tree (CART) algorithm typically takes the form of a tree of IF-THEN rules. An important property of models is transparency. Transparency describes the existence of high-level descriptions, such as rule sets, that allow a human user to understand the basis of the predictions made by a model. For example, models based on a CART algorithm are typically highly transparent because they provide rule sets that are easily interpretable. Models based on a K-nearest neighbor algorithm provide a less transparent model. In this case, the models may still be partially interpreted by looking at the actual nearest neighbor records. Models based on neural net algorithms are typically not transparent, as they provide little interpretable information about the bases for their predictions.
A need arises for a technique by which the transparency of data mining models may be improved so as to be more easily interpretable by human users.