1. Field of the Invention
The present invention generally relates to a system and method for sequential decision making for customer relationship management and, more particularly, a system and method for sequential decision making for customer relationship management which may utilize reinforcement learning.
2. Description of the Related Art
In applications of data mining to targeted marketing, common practice has been to apply a classification or regression method to a customer database, and identify a subset of the customers who are likely to generate positive profits for a given marketing action, and then target the action to those customers. In actual practice of targeted marketing, however, marketing actions are never taken in isolation but rather a series of marketing actions are taken over time. It is therefore desirable that the retailers optimize their marketing actions not only to maximize the single-event profits, but total, cumulative profits accrued over such series of actions.
More specifically, in the last several years, there has been an increasing interest in the machine learning community on the issue of cost-sensitive learning and decision making, specifically as it may apply to data mining. Various authors have noted the limitations of classic supervised learning methods when the acquired rules are used for cost-sensitive decision making (see, e.g., P. Turney, “Cost-sensitive Learning Bibliography”, Institute for Information Technology, National Research Council, Ottawa, Canada, 2000 (http://extractor.iit.nrc.ca/bibliographies/cost-sensitive.html); P. Domingos, “MetaCost: A General Method for Making Classifiers Cost Sensitive”, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 155-164. ACM Press, 1999; C. Elkan, “The Foundations of Cost-sensitive Learning”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, August 2001; B. Zadrozny and C. Elkan, “Learning and making decisions when costs and probabilities are both unknown”, Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, 2001; and D. D. Margineatu and T. G. Dietterich, “Bootstrap Methods for the Cost-sensitive Evaluation of Classifiers”, Proc. 17th International Conf. on Machine Learning, pages 583-590, Morgan Kaufmann, San Francisco, Calif., 2000.).
A number of cost-sensitive learning methods have been developed (e.g., see P. Domingos, “MetaCost: A General Method for Making Classifiers Cost Sensitive”, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 155-164. ACM Press, 1999; B. Zadrozny and C. Elkan, “Learning and Making Decisions When Costs and Probabilities are Both Unknown: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, 2001; and W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: Misclassification Cost-sensitive Boosting” Proc. 16th International Conf. on Machine Learning, pages 97-105. Morgan Kaufmann, San Francisco, Calif., 1999) that have been shown to be superior to traditional classification-based methods.
However, these cost-sensitive methods only try to maximize the benefit (equivalently, minimize the cost) of a single decision, whereas in many applications sequences of decisions need to be made over time. In this more general setting, one must take into account not only the costs and benefits associated with each decision, but also the interactions among decision outcomes when sequences of decisions are made over time.
For example, in targeted marketing, customers are often selected for promotional mailings based on the profits or revenues they are expected to generate on each mailing when viewed in isolation. Profits or revenues are estimated using predictive models that are constructed based on historical customer-response data. To maximize expected profits for a given promotion, only those customers should be mailed whose predicted expected profit is nonzero when taking mailing costs into consideration (e.g., see B. Zadrozny and C. Elkan. Learning and Making Decisions When Costs and Probabilities are Both Unknown” Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, 2001).
However, the above decision policy of selecting customers to maximize expected profits on each promotion in isolation is not guaranteed to maximize total profits generated over time. It may be, for example, that the expected profit obtained by mailing the current promotion to a certain customer might exceed the current cost of mailing. However, it might also increase the profits generated by that customer in future mailings. More generally, marketing actions that are desirable from the perspective of maximizing customer loyalty over time may sacrifice immediate rewards in the anticipation of larger future revenues.
The opposite can also be true. Saturating profitable customers with frequent promotional mail might decrease the profitability of those customers, either because of the annoyance factor or because of the simple fact that everyone has budgetary limits on the amounts they are willing or are able to spend per unit time. The latter implies that a point of diminishing returns will necessarily be reached for each customer as the frequency of mail they receive increases.