The present invention relates to a generation method, a selection method and a program.
Sequential decision making in an environment including unobservable states has been formulated as a partially observable Markov decision process (POMDP) (Patent Literatures 1 to 3). In some decision making problems, observability and invariability of states are known, for example, part of the states are completely observable while the other parts are unobservable. Also, in some cases, an unobservable part is invariable. Conventionally, in such a case, an optimum policy is calculated by a general-purpose POMDP solver.
Patent Literature 1—JP2011-53735A
Patent Literature 2—JP2012-123529A
Patent Literature 3—JP2012-190062A