1. Field of the Invention
The present invention relates to the solution of human-robot interaction problems and, more especially, to the training of robots, notably autonomous robots such as the animal-like robots that have recently come into use.
2. Description of Related Art Including Information Disclosed under 37 CFR 1.97 and 1.98
In recent years there has been an increase in the number of autonomous animal-like robots that have been developed and put on the market, such as Sony Corporation""s four-legged AIBO(trademark) robot, which resembles a dogxe2x80x94see xe2x80x9cDevelopment of an autonomous quadruped robot for robot entertainmentxe2x80x9d by M. Fujita and H. Kitano, in Autonomous Robots, 5, 1998. See also xe2x80x9cRobots for kids: Exploring new technologies for learningxe2x80x9d, by A. Drum and J. Hendler, Morgan Kaufman Publishers, 2000, and xe2x80x9cThe art of creating subjective reality: an analysis of Japanese digital petsxe2x80x9d by M. Kusahara, in the Proceedings of the Artificial Life VII Workshop, 2000, ed. C. Maley and E. Boudreau, pages 141-144.
These autonomous robots are designed not as slaves programmed to follow commands without question, but as artificial creatures fulfilling their own drives. Part of the interest found in owning or interacting with such an autonomous robot is the impression the user receives that a relationship is being developed with a quasi-pet. However, autonomous robots can be likened to xe2x80x9cwildxe2x80x9d animals. The satisfaction that the user finds in interacting with the autonomous robot is enhanced if the user can xe2x80x9ctamexe2x80x9d the robot, to the extent that the user can induce the robot to perform certain desired behaviours on command and/or to direct its attention at, and learn the name of, a desired object.
To the user, it appears that he is xe2x80x9ctrainingxe2x80x9d the robot, by analogy with human-animal interactions. However, given that the robot is more accurately be described as a kind of dynamic programming in the field. In the present document, references to xe2x80x9ctrainingxe2x80x9d should be understood in this sense.
However, it is difficult to train an autonomous robot to perform specific tasks on command, especially tasks involving an unusual pattern of behaviour or a sequence of actions, or to learn the name for specific objects. Several groups are involved in research in this field, see, for example, xe2x80x9cExperiments on human-robot communication with robota, an interactive learning and communicating doll robot.xe2x80x9d by A. Billard, K. Dautenhahn and G. Hayes, from xe2x80x9cSocially situated intelligence workshopxe2x80x9d (SAB 98), eds. B. Edmonds and K. Dautenhahn, 1998, pages 4-16; xe2x80x9cExperimental results of emotionally grounded symbol acquisition by four-legged robotxe2x80x9d by M. Fujita, G. Costa, T. Takagi, R. Hasegawa, J. Yokono and H. Shimura, in the Proceedings of Autonomous Agents 2001, 2001; xe2x80x9cLearning to behave: Interacting agentsxe2x80x9d by F. Kaplan, from the CELE-TWENTE Workshop on Language Technology, October 2000, pages 57-63; and xe2x80x9cLearning from sights and sounds: a computational modelxe2x80x9d PhD thesis by D. Roy, MIT Media Laboratory, 1999.
The present inventors, considering that the problems involved in teaching a complex behaviour (and associated command) to an autonomous robot, and/or in reaching shared attention with an autonomous robot such that the name of a desired object could be taught, are similar to the problems faced by animal trainers, determined that robots could be trained by application of techniques used for pet training.
Over the last fifty years, there have been some fruitful exchanges between ethologists and robotics engineers. For example, in some cases robotics engineers have defined control architectures for robots, based on observations about animal behaviour. Different surveys of behaviour-based robotics are given in xe2x80x9cBehaviour-based roboticsxe2x80x9d by R. Arkin, MIT Press, Cambridge Mass., USA, 1998; in xe2x80x9cUnderstanding intelligencexe2x80x9d by R. Pfeiffer and C. Sheier, MIT Press, Cambridge, Mass., USA, 1999; and in xe2x80x9cThe xe2x80x98artificial lifexe2x80x99 route to xe2x80x98artificial intelligencexe2x80x99. Building situated embodied agents,xe2x80x9d by L. Steels and R. Brooks, Lawrence Erlbaum Ass., New Haven, USA, 1994. Robot-based research has also led to development of models that may be useful for understanding animal behaviourxe2x80x94see xe2x80x9cWhat does robotics offer animal behaviour?xe2x80x9d by Barbara Webb, Animal Behaviour, 60:545-558, 2000. However, so far, when tackling robotics problems robotics researchers have not made many investigations in the field of animal training.
The method most often used by dog owners attempting to train their pets, for example, to sit down on command, involves chanting the command (here xe2x80x9cSITxe2x80x9d) several times, whilst simultaneously forcing the animal to demonstrate the desired behaviour (here by pushing the dog""s rear down to the ground). This method fails to give good results for various reasons. Firstly, the animal is forced to choose between paying attention to the trainer""s repeated word, or to the behaviour to be learnt. Secondly, as the command is repeated several times, the animal does not know which part of its behaviour to associate with the command. Finally, very often the command is said before the behaviour is exhibited; for instanced xe2x80x9cSITxe2x80x9d is said while the animal is still in a standing position. Thus, the animal cannot associate the command with the desired sitting position.
For these reasons, animal trainers usually one of the techniques listed below (which involve teaching a desired behaviour) first, and then add the associated command. The main techniques are:
the modelling method,
the luring method,
the capturing method,
the imitation method, and
shaping methods.
The present inventors considered that it was advisable to follow the same sort of approach when training a robot, given that the problem of sharing attention and discrimination stimuli is even more difficult with a robot than with an animal.
The modelling method is another technique often tried by dog owners but rarely adopted by professional trainers. This involves physically manipulating the animal into the desired position and then giving positive feedback when the position is achieved. Learning performance is poor, because the animal remains passive throughout the process. Modelling has been used in an industrial context to teach positions to non-autonomous robots. However, for autonomous robots which are constantly active, modelling is problematic. Only partial modelling could be envisaged. For instance, the robot would be able to sense that the trainer is pushing on its back and then decide to sit, if programmed to do so. However, it is hard to generalise this method to the training of complex movements involving more than just reaching a static position.
The luring method is similar to modelling except that it does not involve a physical contact with the animal. A toy or treat is put in front of the dog""s nose and the trainer can use this to guide the animal into the desired position. This method gives satisfactory results with real dogs but can only be used for teaching position or very simple movement. Luring has not been used much in robotics. The AIBO(trademark) robots that have been released commercially are programmed to be interested automatically in red objects. Some owners of these robots use this tendency so as to guide their artificial pet into desired places. However, this usage remains fairly limited.
In contrast to the modelling and luring methods, the capturing methods exploit behaviours that the animal produces spontaneously. For instance, every time a dog owner acknowledges his pet is in the desired position or performing the right behaviour this gives a positive reinforcement.
The present inventors investigated the suitability of a capturing technique for training autonomous robots, using a simple prototype. The robot was programmed to perform autonomously successive random behaviours, some of which corresponded to desired behaviours with which it was wished to associate a respective signal (for example, a word). Each time the robot spontaneously performed one of the desired behaviours the corresponding signal was presented to the robot immediately afterwards. For example, to teach the robot the word xe2x80x9cSITxe2x80x9d, the trainer had to wait until the robot spontaneously sat down, then he would say the word xe2x80x9cSITxe2x80x9d. However, this technique did not work well in the case where the number of behaviours that could receive a name was too large. The time taken to wait for the robot spontaneously to exhibit the corresponding behaviour was too long.
Imitation methods involve the trainer in exhibiting the desired behaviour so as to encourage the animal (or robot) to imitate the trainer. This technique is seldom used by professional animal trainers in view of the differences between human and animal anatomy. Success has been acknowledged only with xe2x80x9chigher animalsxe2x80x9d such as primates, cetaceans and humans. However, this approach has been used in the field of roboticsxe2x80x94see, for example, xe2x80x9cAn overview of robot imitation.xe2x80x9d by P. Bakker and Y. Kuniyoshi in the Proceedings of AISB Workshop on Learning in Robots and Animals, 1996; the paper by A. Billard et al cited supra; xe2x80x9cGetting to know each other: artificial social intelligence for autonomous robotsxe2x80x9d by K. Dautenhahn in Robotics and autonomous systems, 16:333-356, 1995; and xe2x80x9cLearning by watching: Extracting reusable task knowledge from visual observation of human performancexe2x80x9d by T. Kuniyoshi, M. Inaba and H. Inoue in IEEE Transactions on Robotics and Automation, 10(6):799-822, 1994.
In principle, methods based on imitation can handle very rare behaviours, and sequences of actions. However, in practice very heavy computational power is required in the robot. It is therefore difficult to envisage use of such methods for currently available autonomous robots.
The shaping method involves breaking a behaviour down into small achievable responses that will eventually be joined into a sequence to produce the overall desired behaviour. The main idea is to guide the animal progressively towards the right behaviour. Each component step can be trained using any of the other known training techniques. Various shaping methods are known including one designated a xe2x80x9cclicker trainingxe2x80x9d method.
Clicker training is based on B. F. Skinner""s theory of Operant conditioning (see xe2x80x9cThe Behaviour of Organismsxe2x80x9d by B. F. Skinner, Appleton Century Crofs, New York, N.Y., USA, 1938). This method has proven to be one of the most efficient for training a large variety of animals, including dogs, dolphins and chickens. During the 1980s, Gary Wilkes, a behaviourist, collaborated with Karen Pryor, a dolphin trainer, to popularise this method for dog training. Whereas, for dolphin training, the dolphins were given stimuli in the form of whistles, for dog training the whistles were replaced by a small metal device (the xe2x80x9cclickerxe2x80x9d) that emitted a brief and sharp clicking sound.
In clicker training, the animal comes to associate the clicker sound (which, in itself, does not mean anything to the animal) with a primary reinforcer that the animal instinctively finds rewardingxe2x80x94typically a treat such as food, toys, etc. After having been associated a number of times with the primary reinforcer, the clicker becomes a secondary reinforcer (also called a conditioned reinforcer), and acts as a clue signalling that a reward will come soon. Because the clicker is not the reward in itself, it can be used to guide the animal in the right direction. It is also a more precise way to signal which particular behaviour needs to be reinforced. The trainer only gives the primary reinforcer when the animal performs the desired behaviour. This signals the end of the guiding process.
Thus, the clicker training process involves at least four stages:
xe2x80x9ccharging upxe2x80x9d the clicker: During this first process the animal has to learn to associate the click with the reward (the treat). This is achieved by clicking and then giving the animal the treat, consistently for around 20-50 times, until it gets visibly excited by the sound of the clicker.
Getting the behaviour: then the animal is guided to perform the desired action. For instance, if the trainer wants the dog to spin in a circle in a clockwise direction he or she will start by clicking each time the dog makes the slightest head movement to the right. when the dog performs the head movement consistently, the trainer clicks only when it starts to turn its body to the right. The criteria for obtaining a click are raised slowly until a full spin of the body is achieved. At this stage the treat is given.
Adding the command word: The command word is said only when the animal has learned the desired behaviour. The trainer needs to say the command just after or just before the animal performs the behaviour.
Testing the behaviour: Then the learned behaviour needs to be tested and refined. The trainer uses the command word, clicks and rewards with a treat only when the exact desired behaviour is performed.
It is important to note that, as clicker training is used for guiding the animal towards performing a behaviour via a sequence of steps, it can be used not only for the animal to learn an unusual behaviour that the animal hardly ever performs spontaneously, but also for the animal to learn to perform a sequence of behaviours.
Table 1 summarises the suitability of the various above-mentioned techniques for training animals and considers whether they might be applied to training robots.
According to the preferred embodiments of the present invention, the clicker training technique is applied for training robots, notably autonomous robots, to perform desired behaviours and/or to direct attention to a desired object (so that the name can be learned). Although attempts have been made to user clicker training to train a virtual character displayed on a screen (see xe2x80x9cInteractive training for synthetic charactersxe2x80x9d by S-Y. Yoon, R. Burke and G. Schneider, in AAAI 2000, 2000), it is believed that this is the first time that a robot-training technique has been based on this kind of method.
More particularly, the present invention provides a robot-training method in which a behaviour is broken down into smaller achievable responses that will eventually lead to the desired final behaviour. The robot is guided progressively to the correct behaviour through the use, normally the repeated use, of a secondary reinforcer. When the correct behaviour has been achieved, a primary reinforcer is applied so that the desired behaviour can be xe2x80x9ccapturedxe2x80x9d.
The robot-training method of the present invention enables complex and/or rare behaviours, and sequences of behaviours, to be taught to robots. It is especially well adapted to the training of autonomous animal-like robots. It has the advantage that it is simple to implement and requires relatively low computational power.
The desired behaviour can correspond to the overall sequence of smaller achievable responses, or merely to the last of the sequence.
The desired behaviour can be the directing of the robot""s attention to a particular subject. Thus, the present invention provides a simple way to overcome the problem of ensuring xe2x80x9cshared attentionxe2x80x9d between a robot and another (typically a person attempting to teach the robot the names of objects).
The robot is adapted (typically by pre-programming) to respond to the secondary reinforcer(s) by exploring behaviours xe2x80x9cclose toxe2x80x9d the behaviour that prompted the issuing of the secondary reinforcer. The robot is further adapted to respond to the primary reinforcer by registering the behaviour (or sequence of behaviours) that prompted the issuing of the primary reinforcer and, preferably, by registering a command indication that the trainer issued after the primary reinforcer.
In general, the primary reinforcer(s) will be programmed into the robot whereas the secondary reinforcers are learned (either via a predetermined registration procedure or via a conditioning process teaching the robot by associating the secondary reinforcer with a primary reinforcer).
These and further features and advantages of the present invention will become clear from the following description of a preferred embodiment thereof, given by way of example, and illustrated with reference to the accompanying drawings, in which: