Reinforcement learning is an intellectual operation where an active subject such as an autonomously moving robot observes its own environment, acts, and acquires the next suitable plan from the results. In particular, the “environment identification technique” is a learning means not using a teaching signal, therefore is to be a system suited for determining action in an unknown environment. As typical reinforcement learning methods, the “Q-learning” and other environment identification techniques finding a value function of a state-action pair and the “experience reinforcement technique” utilizing an episode stored in a memory are known.
For the general theory of the reinforcement learning method, [1] S. Russell and P. Norvig: Artificial Intelligence-A Modern Approach, Prentice Hall, 1995 (translated into Japanese as “Agent Approach-Artificial Intelligence”, KYORITSU SHUPPAN, 1997) or [2] R. S. Sutton and A. G. Barto: Reinforcement Learning-An Introduction, The MIT Press 1988 (translated into Japanese as “Reinforcement Learning”, Morikita Publishing, 2000) are detailed.
There are many enhancements and applications for the reinforcement learning method. For example, speaking of the basic algorithm, continuous state spaces are being handled in the learning, and R&D aimed at improvement of the learning speed is being carried out. For example, there is the [3] agent learning apparatus (Japan Science and Technology Agency, Patent Document 1).
[Patent Document 1] Japanese Patent Publication (A) No. 2000-35956