This invention relates generally to data mining software.
Data mining software extracts knowledge that may be suggested by a set of data for various uses. For example, data mining software can be used to maximize a return on investment made in collecting marketing data as well as other applications such as credit risk management, process control, medical diagnosis and so forth. Typically, data mining software uses one or a plurality of different types of modeling algorithms in concert with a set of test data to determine what types of characteristics are most useful in achieving a desired response rate or behavioral response from a targeted group of individuals represented by the data. Generally, data mining software executes complex data modeling algorithms such as linear regression, logistic regression, back propagation neural network, Classification and Regression (CART) and Chi squared Automatic Interaction Detection (CHAID) decision trees, as well as, other types of algorithms on a set of data. The results obtained by executing these algorithms are typically conveyed to a decision maker in order to decide what type of model might be best for a particular use.
One technique which is used to convey this information to a decision maker is the use of a visual representation of model performance such as a lift chart or a receiver operating characteristic curve. A lift chart measures the ability of a model to rank order scores so that higher scores exhibit more of the model's attribute or behavior. Whereas, a receiver operating characteristic curve compares a percentage of hits to a percentage of false alarms produced by a model of behavior thereby providing a measure of the accuracy of a model.
In response modeling a lift chart can be used to visually describe which prospects are more likely to respond to a particular stimuli. For example, a lift chart can be used in a marketing promotion campaign to identify likely responders versus non-responders. Therefore in such an application, the X axis of a lift chart would represent file depth, or the fraction of all prospects that can be contacted, whereas the Y axis of would show a fraction of all responders that would be successfully targeted at a specified file depth.
A lift chart is typically referenced from a base line which is a line of the form y=x which indicates the average or expected performance of not using a model. That is, for example, when 30% of all prospects are contacted, 30% of all responders are expected to be reached. When 40% are contacted, 40% of responders are expected to be reached and so forth. An algorithm that sorts prospects by their propensity to perform in an expected behavioral manner will produce a result that can be plotted as a lift curve. A useful model will produce a lift curve that is above (i.e., exhibits lift) the diagonal reference line.
The lift over the diagonal reference line is the expected or predicted improvement generated by the model by intelligently targeting specific prospects based on model inputs. The model inputs will vary based upon the application to which the data mining software is being applied, as well as the nature of the algorithm type used in the model.
While the conventional lift chart is adequate to provide a visual depiction of predicted modeling behavior, the conventional lift chart may become inadequate when the data mining software executes a large plurality or type of algorithms or a large number of algorithms of the same type.