1. Field of the Invention
The present invention relates to a learning control apparatus, a learning control method, and a computer program. More specifically, the present invention relates to a learning control apparatus, a learning control method, and a computer program for causing an autonomous agent to select a target task on its own, to perform a learning process by making a plan to achieve a goal state and by executing the plan, and then to expand successively capability of the autonomous agent.
2. Description of the Related Art
In known behavioral learning, input and output variables to a learner are manually selected taking into consideration tasks to be solved and expected behaviors as disclosed by Richard S. Sutton and Andrew G. Barto in the book entitled “Reinforcement Learning” MIT Press (1998), and by J. Tani in the paper entitled “Model-based learning for mobile robot navigation from the dynamical systems perspective” IEEE Trans. on Systems, Man, and Cybernetics part B: Cybernetics, Vol. 26, No. 3, pp. 421-436, 1996.
In view of multi-freedom autonomous robots, however, determining task and input and output variables in design stage means limiting a learning capability of a robot from the design stage. If a reward function serving as a goal is imparted by human, an agent can only solve the corresponding problem. The known method is thus subject to serious problem in the designing of an open-ended autonomous robot.
In one contemplated scenario, an autonomous agent is forced to continue to improve learning toward an unachievable goal by selecting, as the goal, a reward function that is difficult to achieve. With the technique of reinforcement learning, learning approaching the goal cannot be continued without giving any single reward.
To overcome this problem, another technique is disclosed by J. Morimoto and K. Doya in the paper entitled “Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning”, Robotics and Autonomous Systems, 36, 37-51 (2001). In the disclosed technique, a sub-goal is set up in addition to a reward function serving as a final goal, a hierarchical structure is constructed, and a controller is caused to perform reinforcement learning with an objective set to cause a hierarchically lower module to solve the sub-goal. The controller causes a hierarchically higher module to learn a goal switching control rule to achieve the goal.