1. Field of the Invention
The present invention generally relates to a new method for aggregating individual items of evidence for class probability estimation of a response variable in a classification problem. A good prediction for the class response probability has many uses in data mining applications, such as using the probability to compute expected values of any function associated with the response, and in many marketing applications where lift curves are generated to select prioritizing target customers.
2. Background Description
The Naïve Bayes (NB) model for classification problems is attractive for its simplicity and its good model understandability There have been several studies of how well the model performs as a classifier. P. Domingos and M. Pazzani in “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss”, Machine Learning, 29, pp. 103–130, 1997, explore theoretical conditions under which NB may be optimal even though its assumption of independence of the feature values given the class may not hold, and also supply empirical evidence. D. J. Hand and K. Yu in “Idiot's Bayes-Not so Stupid After All”, International Statistical Review, 69, pp. 385–398, 2001, give arguments on why the independence assumption is not so absurd. A. Garg and D. Roth in “Understanding Probabilistic Classifiers”, Proceedings of ECML-2001, 2001, consider all joint distributions and show that the number of these distributions goes down exponentially with their distance from the product distribution of NB, thereby explaining the power of NB beyond the independence assumption. These studies focus on classification error.
The basic NB model has been modified and extended in several ways to remove some of its limitations. For example, P. Langley and S. Sage in “Induction of Selective Bayesian Classifiers”, Proceedings of the Tenth Conference of Uncertanty in Artificial Intelligence, Morgan Kaufman, Seattle, Wash., pp. 399–406, 1994, use a feature subset selection approach to eliminate potential conditionally-correlated features. Other approaches such as the Tree Augmented Naïve-Bayes (TAN) model of N. Friedman and M. Goldszmidt in “Building Classifiers Using Bayesian Networks”. Proceedings of the Thirteenth National Conference of Artificial Intelligence, Menlo Park, pp. 1277–1284, 1966, generalize NB by relaxing restrictive conditional independence assumption.
In many data mining applications, the desired model output is the class probability. Examples include marketing applications in which a mailing is sent out to consumers whose estimated probability of response to the mailing exceeds a given level. This level is chosen to maximize expected profit, based on a “lift curve” (e.g., G. Piatetsky-Shapiro and S. Steingold, “Measuring Lift Quality in Database Marketing”, SGKDD Explorations, 2, pp. 76–80, 2000).
While the usual NB approach is already known to be quite effective in predicting class membership, there are many applications where estimation of class probability is of prime importance (such as when these probabilities are used to generate lift curves).