1. Field of the Invention
The present invention relates to an information processing apparatus, an information processing method, and a computer program, and, more particularly to an information processing apparatus, an information processing method, and a computer program that can self-organize an internal state to create an environment model.
2. Description of the Related Art
In recent years, researches and developments have been actively performed concerning reinforcement learning. The reinforcement learning means a method of mechanical learning for autonomously acquiring an optimum behavior on the basis of actual experiences and returns. Mechanical learning for learning by trial and error, relying only on returns from an environment, a control method for attaining the returns is referred to as reinforcement learning in a broad sense (see, for example, “Reinforcement Learning” Richard S. Sutton, Andrew G. Barto, translated by Sadayoshi Mikami and Masaaki Minakawa, Morikita Publishing. The reinforcement learning have been applied to various Markov decision problems having finite numbers of states and finite numbers of behaviors such as acquisition of strategies in games and achieved successes.
However, there are still a large number of problems in applying the reinforcement learning to various difficult problems in the real world.
One significant problem among the problems is that returns, behaviors, and environment models are learned all together. In the reinforcement learning, an environment is acquired in a form of prediction of a return that can be received, i.e., a value. In other words, it is only learned how much value a present state has. A change in a state is not modeled. In architectures such as SARSA and actor critic, a value function depending on a present actor (behavior determination) is learned. Therefore, it is necessary to learn an actor and a value function from the beginning every time a purpose (a return) changes. However, originally, an environment model indicating how an environment changes when a user behaves toward the environment how should be able to be used in common even if a purpose changes. Therefore, in solving various problems, it is more efficient to plan a behavior on the basis of an environment prediction model.