1. Field of the Invention
The present invention relates to a technique of deciding an optimal action in consideration of risk. More specifically, the present invention relates to a technique of deciding an action using Markov decision process (MDP).
2. Description of Related Art
A simulation system and simulation method of integrally evaluating interest risk and credit risk of a portfolio are described in Japanese Unexamined Patent Publication No. 2002-230280. The technique provides that: (1) a large number of scenarios from a present time to a risk horizon are generated based on a default-free interest process model and a default process model; (2) a price of a portfolio and a price of an individual asset in the risk horizon are computed for each of the generated scenarios; and (3) a future price distribution of the portfolio and/or a future price distribution of the individual asset are determined based on the computed prices. As a result, the technique integrally evaluates interest risk and credit risk of the portfolio.
Research is also conducted on a risk computation technique that uses a Markov process. The Markov process has a Markov property, where its future state transition depends only on its present state and independently of its past state. Research is further conducted on an action decision technique that uses a Markov decision process, which is an extension of the Markov process. For example, for a target capable of undergoing state transitions, a Markov decision process problem is a problem to find a rule for deciding an action to be executed in each state in order to maximize an expected cumulative reward obtained from the target.
To provide a credit portfolio control method used for selecting an optimal policy in credit control to enable situations of external factors such as an economic environment and a set credit line to be reflected on a future credit rating transition probability, Japanese Patent No. 4400837 discloses a technique of creating a graph in which transitions of combinations of each state of an existing credit and each state of an external factor from the first to T-th years are represented. The technique provides: (1) for the first year, a node including an existing credit's initial state and the external factor's initial state and; and (2) for the second to T-th years, nodes indicating patterns of combinations of each state of the existing credit and each state of the external factor. The aforementioned technique corresponds to finding an optimal policy that, by way of solving a Markov decision process problem of T years by dynamic programming (DP), maximizes an expected total gain for T years while tracking back from a terminal T-th year node.
In addition, an iterated risk measure is recently receiving attention as a risk measure based on which a financial institution determines its capital. A (conditional) value at risk is also called a CTE (conditional tail expectation), but has no time consistency. However, the iterated risk measure has time consistency. This is described in M. R. Hardy and J. L. Wirch, “The iterated CTE: A dynamic risk measure”, The North American Actuarial Journal, 62-75, 2004.
However, a backward-computed iterated CTE (ICTE) is considered to be difficult to implement, because ICTE requires a large computation load. Furthermore, a typical Monte Carlo method cannot handle ICTE.
The iterated risk measure can represent risk preference that is rational but cannot be represented by expected utility, discounted expected utility, or the like. Accordingly, Japanese Patent Application No. 2010-211588 discloses a technique of optimizing a Markov decision process so as to minimize the iterated risk measure using dynamic programming.
However, the technique described in the specification of Japanese Patent Application No. 2010-211588 requires an extremely long computation time when the number of possible states or actions increases. Thus, the technique can actually solve only limited problems, and as a result the technique is constrained.