1. Field of the Invention
The present invention relates to a reinforcement learning apparatus and the like that perform robot motor learning.
2. Description of Related Art
Reinforcement learning has been widely used as a robot motor learning technique because it can be implemented even if the dynamics of the control object or the environment are unknown and it autonomously performs learning by simply setting a reward function according to the task (see, for example, JP 2007-66242A).
Conventional techniques, however, have a problem in that the reward function for a complex motion trajectory is often expressed by the sum of various terms, and a trade-off occurring between the terms impedes learning (this is called the “trade-off problem”). For example, the reward function in a two-point reaching movement task is generally composed of a positive reward given at the target point and a negative reward for the energy used. If the ratio of these two elements is not set appropriately, then the speed of learning results is extremely increased or decreased, resulting in undesirable motion trajectories. This trade-off problem becomes more challenging if requirements such as obstacle avoidance, in addition to a reaching movement, are further imposed. Too small a negative reward given upon contact with an obstacle results in collision of the robot arm with the obstacle, and too large a negative reward leads to a learning result in which the robot arm does not move from the starting point. When the reward function has become too complex, the designer has to empirically adjust the balance between the elements, compromising the advantage of reinforcement learning.