The present invention generally relates to a search technique for a decision-making model, and particularly, to learning under a Markov decision process system environment. More specifically, the present invention relates to a method, controller, and control program product for updating a parameter that defines a policy under a Markov decision process (MDP) system environment.
A Markov decision process is a conventionally known stochastic decision-making process. A typical problem formulated as a Markov decision process has widespread application in areas such as autonomous control problems of a business decision-making model, a robot, a plant, a mobile vehicle (e.g., a train, a car, a ship, or an airplane) and the like. The business decision-making model is, for example, decision-making support for marketing, a Web service or the like. Learning about a decision-making model is an important data analysis technique in a wide variety of fields such as in optimizing a decision-making support system for marketing, optimizing a Web service, or learning about an agent behavior model for traffic simulation.