1. Field of the Invention
The present invention relates to a determination of an optimal action, and more particularly to a determination of an optimal action in consideration of risk by repetitively calculating a risk measure.
2. Description of Related Art
Studies on Markov Decision Process (MDP) have been underway (G. Tirenni, A. Labbi, A. Elisseeff, and C. Berrospi, “Efficient allocation of marketing resources using dynamic programming,” in Proceedings of the SIAM International Conference on Data Mining, 2005, J. A. Filar, L. C. M. Kallenberg, and H. Lee, “Variance-penalized Markov decision processes,” Mathematics of Operations Research, vol. 14, pp. 147-161, 1989, D. J. White, “Mean, variance, and probabilistic criteria in finite Markov decision processes: A review,” Journal of Optimization Theory and Applications, vol. 56, no. 1, pp. 1-29, 1988, R. Munos and A. W. Moore, “Variable resolution discretization for high-accuracy solutions of optimal control problems,” in Proceedings of the International Joint Conference on Artificial Intelligence, 1999, pp. 1348-1355, R. Neuneier, “Enhancing Q-learning for optimal asset allocation,” in Advances in Neural Information Processing Systems, 1998, vol. 10, pp. 936-942, H. Kawai, “A variance minimization problem for a Markov decision process,” European Journal of Operational Research, vol. 31, pp. 140-145, 1987, M. L. Puterman, Markov Decision Processes, John Wiley and Sons, 1994). A Markov Decision Process maximizes cumulative reward that will be gained from an object. The object can transition between states by performing actions at time steps within a time horizon of interest, where the time horizon of interest is characterized by certain rules. Markov Decision Process possesses Markov property, which is a future state transition that determined by depending on the current state but not on the past state transitions. By using Markov Decision Process, an optimal action for each time step of a time horizon can be determined.
Researches on portfolio theory have been conducted (H. Markowitz, “Portfolio Selection,” Journal of Finance, vol. 7, pp. 77-91, March 1952). In addition, approaches to risk management, such as application to management of asset portfolios, are also being studied by financial companies (Japanese Patent Laid-Open No. 2008-0040522, Japanese Patent Laid-Open No. 2001-0125953, Japanese Patent Laid-Open No. 2002-0041778, Japanese Patent Laid-Open No. 2002-0157425, Japanese Patent Laid-Open No. 2002-0183429, Japanese Patent Laid-Open No. 2003-0006431, Japanese Patent Laid-Open No. 2003-0345981, Japanese Patent Laid-Open No. 2005-0107994). There are also researches on dynamic risk measures for risk management (Japanese Patent Laid-Open No. 2006-0500692, M. R. Hardy and J. L. Wirch, “The iterated CTE: A dynamic risk measure,” The North American Actuarial Journal, 62-75, 2004, P. Boyle, M. Hardy, and T. Vorst, “Life after VaR,” Journal of Derivatives, 13(1):48-55, 2005, B. Acciaio and I. Penner, “Dynamic risk measures,” Feb. 17, 2010, M. Kupper and W. Schachercaner, “Representation results for law invariant time consistent functions,” Aug. 24, 2009, F. Riedel, “Dynamic coherent risk measures,” Stochastic Processes and their Applications, 112:185-200, 2004, P. Artzner, F. Delbaen, J.-M. Eber, D. Heath, and H. Ku, “Coherent multiperiod risk adjusted values and Bellman's principle,” Annals of Operations Research, 152(1):5-22, 2007, T. Wang, “A class of dynamic risk measures,” September, 1999).