1. Field of the Invention
The present invention relates to a reinforcement learning system for making an agent learn an action policy for executing a task.
2. Description of the Related Art
In an attempt to reduce a learning time for obtaining a behavior suitable for a task, there has been disclosed an art (refer to Japanese Patent Laid-open No. 2005-078516) which determines action policies for an agent from learning results obtained by a plurality of learning devices based on the surrounding state and selects an action policy from the plural action policies based on the learning performance of each learning device.
According to the conventional art, only the learning result from a single learning device is utilized, thereby, it is difficult for the agent to execute such a task that involves complicated actions which may be achieved by making full use of the learning results from the other learning devices.