Cost-sensitive classification is a critical component in many real world applications in business decision support and manufacturing among others, in which different types of misclassification can have significantly different costs associated with them. Cost-sensitive learning involves classification tasks in presence of varying costs associated with different types of misclassification, such as false-positives and false-negatives. A large number of practical application domains motivate cost-sensitive learning, as documented in the literature: examples include targeted marketing, medical diagnosis, fraud detection, credit rating, network intrusion detection, anomaly detection in manufacturing processes, to name a few. There has been considerable theoretical as well as empirical research on this topic, both in the machine learning and data mining communities (see, for example, B. Zadrozny and C. Elkan, “Learning and making decisions when costs and probabilities are both unknown”, Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pp. 204-213, ACM Press, 2001, P. Chan and S. Stolfo, “Toward scalable learning with non-uniform class and cost distributions”, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 164-168, 1998, and B. Zadrozny, J. Langford, and N. Abe, “Cost-sensitive learning by cost-proportionate example weighting”, Proceedings of the Third IEEE International Conference on Data Mining, pp. 435-442, 2003).
For pure classification, extensive past research has established that the family of boosting methods, including AdaBoost (Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139, 1997) and its many variations, enjoys superior empirical performance and strong theoretical guarantees. For cost-sensitive learning, however, there has not been a comprehensive study of relative merits of different boosting algorithms. Some attempts have been made to extend the AdaBoost algorithm into cost-sensitive versions, e.g. AdaCost (W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan. AdaCost: Misclassification cost-sensitive boosting. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 97-105, 1999.) and CSB2 (K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 983-990, 2000), but the aggressive weight updating scheme based on the exponential loss posed difficulties in balancing the contributions of the cost information and boosting's focus on misclassification error. More recently, an effort was made to bridge this gap with the proposal of a cost-sensitive boosting method called GBSE (. Abe, B. Zadrozny, and J. Langford. An iterative method for multi-class cost-sensitive learning. In KDD'04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 3-11, New York, N.Y., USA, 2004. ACM), inspired by the framework of gradient boosting, but only a partial theoretical justification was provided, where the proof of convergence was given for a variant of the proposed method.