In recent years, the technique called reinforcement learning is studied vigorously in the field of unsupervised learning. The reinforcement learning is known as a framework of the learning control which generates an operation signal to environment such as a control object so that a measurement signal obtained from the environment may become desirable through a trial-and-error interaction with the environment.
The reinforcement learning has a learning function which generates an operation signal to the environment so that the expected value of the evaluation value obtained from the present state to the future may become the maximum, with a clue of a scalar evaluation value (in the reinforcement learning, called the reward) calculated using the measurement signal obtained from the environment. Methods of implementing such a learning function include algorithms, such as Actor-Critic, Q-learning, and real-time Dynamic Programming, for example.
There is a framework called Dyna-architecture as a framework of the reinforcement learning into which the above-mentioned technique is developed. This is the method of learning beforehand what kind of operation signal should be better to be generated for a model which simulates a control object, and of determining the operation signal to be applied to the control object using this learning result. Dyna-architecture also has a model adjustment function which decreases an error between the control object and the model.
Patent Document 1 discloses the technology to which the reinforcement learning is applied. In the technology, there are provided two or more reinforcement learning modules which are a group of systems each possessing a model and a learning function. A responsibility signal which takes a larger value for a smaller prediction error between the model and the control object in each of the reinforcement learning modules is calculated, and an operation signal generated from each of the reinforcement learning modules is weighted in proportion to the responsibility signal. In this way, the operation signal to be applied to the control object is determined.
Patent Document 1: JP-2000-35956A