1. Field of the Invention
The present invention relates to a pointing position detection device and to an autonomous robot, which detect a pointing position which a human being is exhibiting by recognizing the attitude of the human being based upon an image.
2. Description of the Related Art
In the conventional art, there has been a known type of autonomous robot which, by performing speech recognition, initiates certain behavior upon recognition of an indication provided to it in the form of a speech indication which is generated by a human being. This kind of autonomous robot has the distinctive feature that, if an indication is to be provided to it, the autonomous robot is able to receive an indication without the person who is providing the indication needing to utilize any special device.
However, such a system is subject to the problem that it is not possible to transfer an indication accurately, since, with an indication system which employs speech, the speech recognition ratio is deteriorated in areas in which the noise level is high. Furthermore, it is necessary to register the speech pattern of the human being who is generating the speech in advance in order to enhance the speech recognition ratio, and there is no way of being able to provide such an indication by recognizing the speech of an arbitrary person.
In order to solve this type of problem, a method has been tried of recognizing the attitude of the body of a human being based upon image information, and of recognizing the indication which is meant by this attitude. For example, there is a known pointing gesture direction inferring method which has been described in The Transactions of Electronics, Information, and Systems (IEE of Japan), Vol. 121-C (2001.9), p. 1388-p. 1394, “Detection of Omni-Directional Pointing Gestures” (hereinafter referred to as “Related Art 1”). With this method, first the human being is photographed with a plurality of cameras, and a region corresponding to his face is extracted from the image which has been obtained, and then his full face is detected based upon the results of inferring the direction of his face, and also the position of his eye is specified. Next, a hand region is extracted from the image which has been obtained, and the end portion of this region is specified as being the position of his finger tip. In addition, the spatial positions of his eye and his finger tip which have thus been derived are obtained, and the pointing direction is inferred as being along the extended straight line which joins these two positions. Furthermore, there is a known interactive hand pointer method which has been described in The Transactions of Electronicsee, Information, and Systems (IEE of Japan), Vol. 121-C (2001.9), p. 1464-p. 1470, “An Interactive Hand Pointer that Projects a Mark in the Real Work Space” (hereinafter referred to as “Related Art 2”). With this method, the hand of a human being which is making an indication is photographed against a simple background by a camera and the position of the finger tip is obtained by block matching between the image which has been obtained and template images of a finger tip which have been prepared in advance, and the straight line which joins from the central position of a base of the finger which is within a specific region to this position is taken as being the pointing direction.
According to these methods, an object in the direction which is being pointed by the finger tip of a human being who is giving an indication is recognized, and this can be employed as a human-robot interface in order to cause the robot to start a subsequent operation or the like.
Moreover, Japanese Unexamined Patent Application, First Publication No. 2001-56861 and the corresponding European Patent Application, First Publication No. EP 1 059 608 A2 (hereinafter referred to as “Related Art 3”) disclose recognition of the shape and attitude of a hand.
However with a method, as in Related Art 1, of giving a pointed position by taking the extension of a hypothetical straight line joining the head and the hand tip, since the deviation of the detected position becomes great according to increase of the distance to the object which is pointed, there is the problem that it is necessary to implement a special pointing method in order to make this deviation small. Furthermore, since with this method the point furthest from the center of gravity position of the hand region which has been extracted from the image is taken as the position of the tip of the finger, there is the problem that, if the arm of the person is bent, a position which is completely different from may be recognized as being the one which is being pointed.
Furthermore with a method, as in Related Art 2, of detecting a finger tip and the direction in which it is pointing against a simple background, along with it being necessary for the background to be already known, there is the problem that there are also limitations upon the position in which the camera can be located. Yet further, there is the problem with this method that the range for detection is narrow, so that the deviation becomes great for a pointing position other than one which is located at a relatively close distance.
Related Art 3 requires three or more cameras. In addition, it is difficult to apply Related Art 3 to a situation in which the relative location between a robot (cameras) and a human being changes over time. Additionally, Related Art 3 merely detects the direction of the hand tip, so that it cannot determine with high accuracy a position at which a human being is pointing.