In recent years, systems for industrial use have become more and more complicated and it is getting difficult to have the relationship between an input and an output pre-described by a program or the like. For this reason, methods for processing an input signal to obtain a correct output has become necessary. An apparatus for determining an output from an input will be herein referred to as an “action determination apparatus”. Moreover, an apparatus for predicting, from an input signal, a future state change and then obtaining an output is specifically referred to as a “predictive action determination apparatus”.
Known techniques for action determination are divided into three groups. A first group includes techniques for performing action determination only based on a current state, a second group includes techniques for performing action determination based on a change from a past state and a third group includes techniques for predicting a future state and performing action determination.
As techniques for performing action determination only from a current state, there are a technique using the IF-THEN rule, a technique using a neural network, a technique using the memory table reference method and the like. In these techniques, an action for a current state has been pre-described, the current state is judged from an input and then an action is determined with reference to a predetermined description.
However, a correct action can not be always determined only based on the current state. For example, in an interactive robot system, when the question of “Is it OK?” is asked, what the question exactly means can not be understood from the question itself. Only after state changes made by the time when the question is asked are referred to, what the question exactly means can be understood. That is, there are cases where a past state is needed for action determination.
Moreover, there are cases where not only the current or past state but also a future state should be considered. For example, assume that a mobile robot avoids an obstacle. At a stage where the robot has not bumped into an obstacle yet, no problem has occurred. In such a case, only after a future change is considered, i.e., it is taken into consideration that if the robot keeps moving in the same moving direction at the same moving speed, the robot will bump to an obstacle in the future, the robot can take an action of avoiding an obstacle before it bumps to the obstacle.
As techniques related to action determination with consideration of a future state, techniques are disclosed in Patent Reference 1 and Patent Reference 2. In Patent Reference 1, current image data or joint angle data is received as a state from an input signal obtained from an environment by a visual sensor and a joint angle sensor. A system stores a change in an image as an action with respect to a target object and a result of the action by a recurrent neural network and if a similar state is received, a stored action is reproduced. This technique is applied to, for example, autonomous action determination of a robot. Moreover, in Patent Reference 2, an action determination technique in reinforcement learning is shown. An error is predicted from a value in a state and a value in a state one step earlier than the state and the obtained information is used for action determination.
Moreover, in Patent Reference 3, disclosed is a technique in which to achieve safety drive control for a vehicle, a drive route is estimated and if a dangerous point exists on the estimated route in the case where a vehicle continues to be driven at a current speed, the driving speed of the vehicle is controlled to a speed ensuring safety according to a driver's skill before the vehicle reaches the dangerous point.
(Patent Reference 1) Japanese Laid-Open Publication No. 2002-59384
(Patent Reference 2) Japanese Laid-Open Publication No. 2002-189502
(Patent Reference 3) Japanese Laid-Open Publication No. 7-306998
Problems that the Invention is to Solve
As has been described, in Patent Reference 1, it is predicted, using a recurrent neural network, into what kind of state a current state is changed by a self-action. Then, according to a result of the prediction, an action stored with a state as a pair is determined.
However, in Patent Reference 1, a past state change due to a self-action is merely leaned by the recurrent neural network and no prediction is made or no consideration is given for or to a change in the environment not relating to the self-action. Moreover, at a certain point of time, action determination is made based on a current state and a prediction for a state one step later than the current state. However, the state one step later is not necessarily important for the action determination and, therefore, the future state prediction can not be considered to be appropriate for the action determination.
Moreover, in Patent Reference 2, an action determined only from a current state and a predicted value for a state one step later than the current state is not necessarily a desired action. For example, if a robot is desired to avoid a vehicle running toward the robot and the speed of the moving robot is much slower than that of the vehicle, the robot is to bump into the vehicle unless an avoidance action is made many steps earlier. In this manner, when an action determination should be made by looking ahead to future, not only a state one step later but also a future state have to be considered and an action have to be determined. Moreover, for example, like in the above-described case where the robot is desired to avoid a running vehicle, if no change is recognized when a current state and a value for a state one step later than the current state are looked at but a crucial situation is to occur many steps later, an action determined based on the current state and the value for the state one step later might become a useless action.
Moreover, in Patent Reference 3, a drive route along which a vehicle travels in the future is first estimated using map information and vehicle position information and then if a dangerous point exists along the estimated route in the case where the vehicle travels along the estimated route at a current speed, a vehicle speed plan in which a target vehicle speed is set for each point is made. That is, on the assumption that a vehicle travels on a road, it is judged using information for the road whether or not there exists a dangerous point to be a target which should be dealt with. However, if information for a future state such as road information has not been given beforehand, it is not definitely easy even to set up an appropriate target for action determination. Furthermore, in this technique, it is very difficult to determine an action with respect to various situations which have never been experienced.