1. Field of the Invention
The present invention relates to a hand pointing apparatus, and more specifically to a hand pointing apparatus for picking up a person to be recognized and for determining a position or a direction pointed to by the person to be recognized.
2. Description of the Related Art
There has been heretofore known a hand pointing input apparatus which comprises a display for displaying predetermined information, an illuminating device for illuminating an information inputting person who comes to the display, and a plurality of image pickup devices for picking up the image of the approaching information inputting person from different directions, wherein a plurality of image pickup devices image pickup images of situations where the approaching information inputting person points with a finger or the like to an optional position on the display, the information inputting person is recognized in accordance with a plurality of images obtained by the image pickup, the position on the display pointed to by the information inputting person is determined, a cursor or the like is displayed on the position pointed to on the display, and the position on the display pointed to is recognized as being clicked at the time of detecting the fact that the information inputting person has performed a clicking action by raising a thumb, whereby a predetermined processing is performed (see, for example, Japanese Patent Application Laid-open (JP-A) Nos. 4-271423, 5-19957, 5-324181 or the like).
According to the above-described hand pointing input apparatus, since the information inputting person can give various instructions to an information processing apparatus and input various information to the information processing apparatus without touching an input device such as a keyboard or a mouse, it is possible to simplify the operation for using the information processing apparatus.
However, in an environment where the hand pointing input apparatus is actually operated, an object which is not a subject to be recognized, for example, the luggage of the information inputting person or trash, may exist around the information inputting person who is the subject to be recognized. The surroundings of the information inputting person are also illuminated by an illuminating light emitted from the illuminating device. Thus, if the above-described object which is not the subject to be recognized exists around the information inputting person, this object which is not the subject to be recognized is present as a high-luminance object in the images picked up by the image pickup device. Thus, there is a high possibility that an object which is not the subject to be recognized, is recognized as the information inputting person by mistake.
In order to avoid this wrong recognition of the information inputting person, it is necessary to improve the accuracy of the recognition of the information inputting person. For example, it is necessary to perform a complicated image processing such as the total recognition of the information inputting person by the use of a plurality of image features in addition to the luminance (for example, pattern matching or the like based on the subject is outline which is one of the image features). Therefore, since a heavy load is applied to the image processor for performing the image processing such as the recognition based on the picked-up images, this causes a long time to be taken until the instruction from the information inputting person can be determined. In order to reduce the time required for the determination of the instruction from the information inputting person, it is necessary to use an image processor with a higher processing speed. This causes the problem of the cost of the apparatus increasing.
Furthermore, a three-dimensional coordinate of a feature point has been heretofore determined by a calculation from the position of the feature point of the information inputting person on the picked-up image (for example, a tip of his/her forefinger or the like) so as to thereby determine the position on the display pointed to by the information inputting person. However, the calculation processing for determining the three-dimensional coordinate of the feature point is complicated. Due to this fact, a long time is required for the determination of the instruction from the information inputting person in the same manner as the above-described case.
Moreover, a motion raising the thumb has been heretofore predefined as representing a clicking action, and the motion of raising the thumb alone has been thus detected as the clicking. However, the degree of freedom of movement is low, which disadvantageously causes less ease-of-use. On the other hand, if motions other than the motion of raising the thumb are detected as the clicking, the processing to detect the clicking becomes complicated, causing a disadvantageously, long time to be taken before the clicking is detected.
The present invention was completed in consideration of the above facts. It is a first object of the present invention to provide a hand pointing apparatus having a simple construction and being capable of reducing the time required for the determination of an instruction from a person to be recognized.
It is a second object of the present invention to provide a hand pointing apparatus capable of improving the degree of freedom of the movement which the person to be recognized makes in order to give the instruction, without spending a long time in the determination of the instruction from the person to be recognized.
In order to achieve the above described objects, a hand pointing apparatus according to a first aspect of the present invention comprises: illuminating means for illuminating a person to be recognized; a plurality of image pickup means, located in different positions wherein the image pickup range is adjusted for each image so that the person to be recognized who is illuminated by the above-described illuminating means, may be within the image pickup range, and an illuminated range on a floor surface, which is illuminated by the above-described illuminating means, may be out of the image pickup range; and determining means for extracting an image part corresponding to the person to be recognized from a plurality of images based on a plurality of images of situations picked up by the plurality of image pickup means, the situations being indicative of the person to be recognized pointing to either a specific position or a specific direction, and for determining either the position or the direction pointed to by the person to be recognized.
In the first aspect of the present invention, the person to be recognized may point to a specific position on, for example, the surface of a display screen or the like of a display, or may point to a specific direction (for example, the direction in which a specific object exists as seen from the person to be recognized). The determining means extracts the image part corresponding to the person to be recognized from a plurality of images based on a plurality of images of situations picked up by the plurality of image pickup means, where the situations are indicative of the person to be recognized pointing to either the specific position or the specific direction, and the determining means determines either the position or the direction pointed to by the person to be recognized. By calculating a three-dimensional coordinate of a feature point of the person to be recognized (a point whose position is changed in response to the motion by the person to be recognized to point to a specific position or a specific direction, for example, a tip of a predetermined part, (for example, the hand, the finger, or the like), of the body of the person to be recognized making the pointing motion, the tip of a pointer held by the person to be recognized or the like), the determination of the specific position or direction pointed to can be accomplished based on the position of the person to be recognized and the three-dimensional coordinates of the feature point.
In the first aspect of the present invention, the image pickup range of a plurality of pickup means is adjusted so that the person to be recognized, who is illuminated by the illuminating means, may be within the image pickup range, and the illuminated range on the floor surface which is illuminated by the illuminating means, may be out of the image pickup range. Thus, even if an object which is not a subject to be recognized such as luggage or and a trash exists on the floor surface around the person to be recognized while the person to be recognized is illuminated, the possibility that this object which is not the subject to be recognized comes within the image pickup range of the image pickup means is reduced. Furthermore, even if the object which is not the subject to be recognized comes within the image pickup range, the object is not illuminated by the illuminating means and its luminance is thus reduced. Thus, there is little possibility of the image part corresponding to the object which is not the subject to be recognized existing in the image picked up by the image pickup means. Even if the image part corresponding to the object which is not the subject to be recognized exists, the luminance of the image part is reduced.
Thus, in an extraction of the image part corresponding to the person to be recognized by the determining means, it is possible to extract the image part corresponding to the person to be recognized in a short time by a simple processing without a complicated image processing. Therefore, it is possible to reduce the time required for the determination of the instruction from the person to be recognized without the use of an image processor or the like having a high processing speed and a complicated construction.
As described above, according to the first aspect of the present invention, the image pickup range of a plurality of image pickup means is adjusted so that the person to be recognized, who is illuminated by the illuminating means, may be within the image pickup range, and the illuminated range on the floor surface which is illuminated by the illuminating means, may be out of the image pickup range. Thus, an effect is obtained in which it is possible to provide a hand pointing apparatus of a simple construction whereby the time required for the determination of the instruction from the person to be recognized is reduced.
A hand pointing apparatus according to a second aspect of the present invention comprises: a plurality of illuminating means for illuminating a person to be recognized from different directions; a plurality of image pickup means, located in different positions corresponding to each of the plurality of illuminating means, wherein an image pickup range is adjusted so that the person to be recognized, who is illuminated by the corresponding illuminating means, may be within the image pickup range, and the illuminated range on a floor surface, which is illuminated by the corresponding illuminating means, may be out of the image pickup range; controlling means for switching on/off the plurality of illuminating means one by one in sequence, and for controlling so as to image pickup the person to be recognized pointing to either a specific position or a specific direction by the image pickup means corresponding to the switched-on illuminating means; and determining means for extracting an image part corresponding to the person to be recognized from a plurality of images based on a plurality of images picked up by the plurality of image pickup means, and for determining either the position or the direction pointed to by the person to be recognized.
The second aspect of the present invention is provided with a plurality of illuminating means for illuminating the person to be recognized from different directions. The plurality of image pickup means are located in different positions corresponding to a plurality of illuminating means. The image pickup range of the plurality of image pickup means is adjusted so that the person to be recognized, who is illuminated by the corresponding illuminating means, may be within the image pickup range, and the illuminated range on the floor surface, which is illuminated by the corresponding illuminating means, may be out of the image pickup range. Thus, as described in the first aspect of the present invention, even if an object which is not the subject to be recognized, such as luggage or trash, exists on the floor surface around the person to be recognized, the possibility that this object which is not the subject to be recognized comes within the image pickup range of the image pickup means is reduced. Even if this object comes within the image pickup range of the image pickup means, the luminance of the picked-up image is reduced.
The controlling means switches on/off a plurality of illuminating means one by one in sequence, and controls so as to pickup the images of the person to be recognized pointing to either a specific position or a specific direction by the image pickup means corresponding to the switched-on illuminating means, whereby the picked-up images are output from each of the image pickup means. Thus, even if an object which is not the subject to be recognized comes within the image pickup range, the image pickup is performed by the image pickup means at low luminance.
The determining means extracts the image part corresponding to the person to be recognized from a plurality of images based on a plurality of images output by a plurality of image pickup means, and then it determines either the position or the direction indicated by the person to be recognized. Thus, in the same manner as the first aspect of the present invention, there is little possibility that the image part corresponding to the object which is not the subject to be recognized exists. Even if this image part exists, the image part corresponding to the person to be recognized is extracted in accordance with a plurality of images whose luminance is low. Thus, it is possible to extract the image part corresponding to the person to be recognized in a short time by a simple processing without perfoming complicated image processing.
Therefore, the effect is obtained in which it is possible to provide the hand pointing apparatus wherein the time required for the determination of the instruction from the person to be recognized is reduced, without using an image processor or the like having a high processing speed and a complicated construction.
A hand pointing apparatus according to a third aspect of the present invention comprises: a plurality of illuminating means for illuminating a person to be recognized from different directions; at least one image pickup means for picking up the image of the person to be recognized, who is illuminated by the illuminating means; discriminating means for switching on/off the plurality of illuminating means one by one in sequence, for comparing a plurality of images of the person to be recognized pointing to either a specific position or a specific direction picked up by the same image pickup means during the switching on of the plurality of illuminating means, and for discriminating between an image part corresponding to the person to be recognized and an image part other than the image part corresponding to the person to be recognized in the plurality of images for at least one image pickup means; and determining means for extracting the image part corresponding to the person to be recognized from the plurality of images picked up by the image pickup means based on a result of a discrimination by the discriminating means, and for determining either the position or the direction pointed to by the person to be recognized.
The discriminating means of the third aspect of the present invention switches on/off a plurality of illuminating means one by one in sequence, compares a plurality of images of the person to be recognized pointing to either a specific position or a specific direction picked up by the same image pickup means during the switching on of a plurality of illuminating means, and discriminates between the image part corresponding to the person to be recognized and the image part other than the image part corresponding to the person to be recognized in a plurality of images for at least one image pickup means.
Since a plurality of illuminating means illuminate the person to be recognized from different directions, the luminance is always high in the image part corresponding to the person to be recognized in a plurality of images picked up by the same image pickup means during the switching on of a plurality of illuminating means. The luminance is thus considerably varied in the image part corresponding to the objects which are not the subject to be recognized such as luggage and trash on the floor surface around the person to be recognized, depending on the direction of the illumination during the image pickup. Therefore, by a very simple processing to compare the luminance of the image parts in the images to each other over a plurality of images (for example, to compare average values or minimum values of the luminance in each image part), it is possible to discriminate between the image part corresponding to the person to be recognized and the image part other than the image part corresponding to the person to be recognized in a plurality of images.
The determining means extracts the image part corresponding to the person to be recognized from the plurality of images picked up by the image pickup means based on the result of the discrimination by the discriminating means, and determines either the position or the direction pointed to by the person to be recognized. Therefore, it is possible to extract the image part corresponding to the person to be recognized in a short image by a simple processing without performing complicated image processing. It is also possible to reduce the time required for determining an instruction from the person to be recognized without the use of an image processor or the like having a high processing speed and a complicated construction.
A hand pointing apparatus according to a fourth aspect of the present invention comprises: illuminating means for illuminating a person to be recognized; a plurality of image pickup means for picking up the image of the person to be recognized, who is illuminated by the illuminating means from different directions; determining means for extracting an image part corresponding to the person to be recognized from a plurality of images based on a plurality of images of situations picked up by the plurality of image pickup means, the situations being indicative of the person to be recognized pointing to either a specific position or a specific direction, and for determining either the position or the direction pointed to by the person to be recognized; and preventing means for preventing an object which is not the subject to be recognized from remaining on the floor surface around the person to be recognized.
The fourth aspect of the present invention is provided with the preventing means for preventing an object which is not the subject to be recognized from remaining on the floor surface around the person to be recognized. Since this prevents the object which is not the subject to be recognized from remaining around the person to be recognized, it is possible to prevent the image part corresponding to the object which is not the subject to be recognized from existing in the images picked up by the image pickup means. The determining means extracts the image part corresponding to the person to be recognized based on a plurality of images obtained by the image pickup means, and determines either the position or the direction pointed to by the person to be recognized. Thus, it is possible to extract the image part corresponding to the person to be recognized in a short time by a processing without performing complicated image processing. It is therefore possible to reduce the time required for determining an instruction from the person to be recognized without the use of an image processor or the like having a high processing speed and a complicated construction.
For example, an inclined surface (slope) formed on the floor surface around the person to be recognized can be used as the preventing means. Thus, even if a relatively large object which is not the subject to be recognized (for example, the luggage of the person to be recognized) is placed around the person to be recognized, the object which is not the subject to be recognized slides down on the inclined surface. Thus, it is possible to prevent an object which is not the subject to be recognized, such as the luggage of the person to be recognized, from being placed around the person to be recognized.
Air flow generating means such as a fan for generating an air flow around the person to be recognized may be also applied as the preventing means. Thus, since a relatively small object which is not the subject to be recognized (for example, small trash, dust or the like) is blown away by the generated air flow, it is possible to prevent the object which is not the subject to be recognized such as small trash from remaining around the person to be recognized. A storage tank for storing water or the like around the person to be recognized may be also arranged as the preventing means. Furthermore, this storage tank may be circular in shape so that the water or the like may circulate through the storage tank, whereby it may be used as the preventing means.
According to the fourth aspect of the present invention, since there is provided a preventing means for preventing an object which is not the subject to be recognized from remaining on the floor surface around the person to be recognized, the effect is obtained in which it is possible to provide a hand pointing apparatus of simple construction wherein the time required for the determination of an instruction from the person to be recognized is reduced.
A hand pointing apparatus according to a fifth aspect of the present invention comprises: illuminating means for illuminating a person to be recognized who arrives at a predetermined place; a plurality of image pickup means for picking up the image of the person to be recognized, who is illuminated by the illuminating means from different directions; storing means for storing information for corresponding the three-dimensional coordinates of a plurality of virtual points positioned near the predetermined place, to the positions of the plurality of virtual points on the plurality of images picked up by the plurality of image pickup means; and determining means: for extracting an image part corresponding to the person to be recognized from a plurality of images based on a plurality of images of situations picked up by the plurality of image pickup means, the situations being indicative of the person to be recognized pointing to either a specific position or a specific direction; for determining the position of a feature point of the person to be recognized in each of the images; for determining the three-dimensional coordinate of the feature point based on the determined position of the feature point and the information stored in the storing means; and for determining either the position or the direction pointed to by the person to be recognized based on the determined three-dimensional coordinates of the feature point.
In the fifth aspect of the present invention, the storing means stores therein the information for corresponding the three-dimensional coordinates of a plurality of virtual points positioned near the predetermined place to the positions of the plurality of virtual points on the plurality of images picked up by the plurality of image pickup means. The determining means extracts the image part corresponding to the person to be recognized from a plurality of images based on a plurality of images of situations picked up by the plurality of image pickup means, where the situations are indicative of the person to be recognized pointing to either a specific position or a specific direction, and the determining means determines the position of the feature point of the person to be recognized in the each image. Then, the determining means determines the three-dimensional coordinates of the feature point based on the determined position of the feature point and the information stored in the storing means, and determines either the position or the direction pointed to by the person to be recognized based on the determined three-dimensional coordinates of the feature point.
Thus, in the fifth aspect of the present invention, a correspondence between the three-dimensional coordinates of a plurality of virtual points positioned near the predetermined place, and the positions of the plurality of virtual points on the plurality of images picked up by the plurality of image pickup means is previously confirmed from the information stored in the storing means. The three-dimensional coordinates of the feature point of the person to be recognized is determined based on the information stored in the storing means. Thus, the three-dimensional coordinate of the feature point of the person to be recognized can be determined by a very simple processing. Therefore, it is possible to reduce the time required for the determination of an instruction from the person to be recognized without the use of an image processor or the like having a high processing speed and a complicated construction.
On the other hand, in the fifth aspect of the present invention, it is desirable that many virtual points are stored by corresponding the three-dimensional coordinates thereof to the positions thereof on the images in order to determine the three-dimensional coordinates of the feature point of the person to be recognized with a high level of accuracy. More preferably, the storing means stores the information for corresponding the three-dimensional coordinates of many virtual points constantly spaced in a lattice arrangement near the predetermined place, to the positions of these many virtual points on the plurality of images picked up by the plurality of image pickup means.
In such a manner, many virtual points are constantly spaced in the lattice arrangement, whereby, even if the feature point is located in any position near the predetermined place, the virtual point is positioned in proximity to the feature point. The three-dimensional coordinate of the feature point are determined based on the three-dimensional coordinates of the virtual point which is likely to exist in proximity to the feature point on the three-dimensional coordinates, whereby the three-dimensional coordinates of the feature point can be determined with a high level of accuracy regardless of the position of the feature point on the three-dimensional coordinates.
When many virtual points are constantly spaced in the lattice arrangement in the above-described manner, the three-dimensional coordinate of the feature point can be determined in the following manner, for example.
Namely, the determining means of the fifth aspect of the present invention can determine the position of the feature point of the person to be recognized in the images, extract from the images the virtual points positioned in a region within a predetermined range including the feature point on the images, and determine the three-dimensional coordinates of the feature point in accordance with the three-dimensional coordinates of the common virtual points extracted from the images.
Thus, the virtual points positioned in the region within a predetermined range including the feature point on the images are extracted from the images, whereby all the virtual points which are likely to exist in the region adjacent to the feature point on the three-dimensional coordinate are extracted. An area of this region can be defined in response to a space between the virtual points.
Then, the determining means determines the three-dimensional coordinates of the feature point based on the three-dimensional coordinates of the common virtual points extracted from the images. The images picked up by the image pickup means show the situation within the image pickup range, namely, the subject projected on a plane. Therefore, even if a plurality of points, which are positioned as if they were superimposed when seen from the image pickup means, have different three-dimensional coordinates, the points are located in the same position when picked up on a two-dimensional image. On the other hand, since the common virtual points extracted from the images are present in the position adjacent to the feature point on the three-dimensional coordinates, the three-dimensional coordinates of the feature point are determined from the three-dimensional coordinates of the common extracted virtual points, whereby the three-dimensional coordinates of the feature point can be determined with a higher level of accuracy.
When a positional relationship is exactly constant between a predetermined place at which the person to be recognized arrives and the image pickup means, the information to be stored in the storing means can be set permanently based on the result of an experimental measurement or the like of the three-dimensional coordinates of plural virtual points positioned near a predetermined place, and the positions of plural virtual points on the images picked up by the image pickup means. On the other hand, when there is a variation in the position between a predetermined place at which the person to be recognized arrives and the image pickup means, or when this positional relationship is considerably different in design depending on the individual hand pointing apparatuses, it is necessary to reset the information to be stored in the storing means.
From this point of view, the fifth aspect of the present invention further can comprise: generating means for allowing the plurality of image pickup means to pickup images of the situations where markers are positioned in the positions of the virtual points, the generating means for generating the information for corresponding the three-dimensional coordinates of the virtual points to the positions of the virtual points on the images based on the three-dimensional coordinates of the virtual points and the marker positions on the images picked up by the plurality of image pickup means, and the generating means for allowing the storing means to store the generated information.
Any marker will do as long as the marker is easy to identify on the images obtained by the image pickup. For example, a particular-color mark and a light-emission source such as LED can be used as the marker. The marker may be manually positioned in a predetermined position by a person. Alternatively, the marker may be automatically positioned by moving means for moving the marker to an optional position. When the marker is moved by the moving means, the three-dimensional coordinates of a predetermined position can be determined from the amount of movement of the marker caused by the moving means.
The generating means is provided in the above-mentioned manner, whereby the information for corresponding the three-dimensional coordinates of the virtual points to the positions of the virtual points on the images is automatically generated. Thus, even if there is variation in the position between a predetermined place at which the person to be recognized arrives and the image pickup means, or when this positional relationship is considerably different in design depending on the individual hand pointing apparatuses, it is possible to obtain automatically the information for corresponding the three-dimensional coordinates of the virtual points to the positions of the virtual points on the images with a high level of accuracy.
According to the fifth aspect of the present invention, the information for corresponding the three-dimensional coordinates of a plurality of virtual points positioned near a predetermined place at which the person to be recognized arrives, to the positions of a plurality of virtual points on a plurality of images picked up by a plurality of image pickup means is stored. The three-dimensional coordinates of the feature point is determined based on the position of the feature point on a plurality of images picked up by a plurality of image pickup means and the stored information. Thus, the effect is obtained in which it is possible to provide a hand pointing apparatus of simple construction wherein the time required for the determination of an instruction from the person to be recognized is reduced and the accuracy of instruction determination is excellent.
A hand pointing apparatus according to a sixth aspect of the present invention comprises: illuminating means for illuminating a person to be recognized; a plurality of image pickup means for picking up the image of the person to be recognized, who is illuminated by the illuminating means from different directions; determining means for extracting an image part corresponding to the person to be recognized from a plurality of images based on a plurality of images of situations picked up by the plurality of image pickup means, the situations being indicative of the person to be recognized pointing to either a specific position or a specific direction, and for determining either the position or the direction pointed to by the person to be recognized; first detecting means for extracting the image part corresponding to a predetermined part of the body of the person to be recognized from the plurality of images, and for detecting a change in any one of either an area of the extracted image part, an outline of the extracted image part and a length of an outline of the extracted image part; and processing means for executing a predetermined processing when the change is detected by the first detecting means.
The sixth aspect of the present invention is provided with the first detecting means for extracting the image part corresponding to a predetermined part (for example, the hand, the arm or the like) of the body of the person to be recognized in the plurality of images and for detecting a change in either the area of the extracted image part, the change in the contour of the extracted image part, or the change in the length of the contour line of the extracted image part. The processing means executes a predetermined processing when a change is detected by the first detecting means. The area, the contour, and the length of the contour line of the image part can be relatively easily detected. Moreover, when the person to be recognized moves a predetermined part of the body, even if his/her motion is not a predefined motion, in almost all cases, the area, the contour, and the length of the contour, and the length of the contour line of the image part corresponding to a predetermined part are changed.
Therefore, according to the sixth aspect of the present invention, since a change in the area, the contour, or the length of the contour line of the image part is used, it is possible to improve the degree of freedom of movement which the person to be recognized has in order to instruct the processing means to execute a predetermined processing. This movement can be also detected in a short time. Thus, the effect is obtained in which the instruction from the person to be recognized can be determined in a short time.
On the other hand, when a person beings makes a movement to point to a specific position or a specific direction, even if the position or direction to be pointed to is changed, the fingertip or the like is generally merely moved along a virtual spherical surface centered in the vicinity of the shoulder joint, thereby resulting in little change in the distance between the fingertip or the like and the body including the shoulder joint.
Thus, a hand pointing apparatus according to a seventh aspect of the present invention comprises: illuminating means for illuminating a person to be recognized; a plurality of image pickup means for picking up the image of person to be recognized, who is illuminated by the illuminating means from different directions; determining means for extracting an image part corresponding to the person to be recognized from a plurality of images based on a plurality of images of situations picked up by the plurality of image pickup means, the situations being indicative of the person to be recognized pointing to either a specific position or a specific direction, for determining the three-dimensional coordinates of the feature point whose position is changed when the person to be recognized bends or extends an arm and the three-dimensional coordinates of a reference point whose position is not changed even if the person to be recognized bends or extends an arm, and for determining either the position or the direction pointed to by the person to be recognized in accordance with the three-dimensional coordinates of the feature point and the three-dimensional coordinates of the reference point; and processing means for calculating the distance between the reference point and the feature point and for executing a predetermined processing based on the change in the distance between the reference point and the feature point.
The determining means according to the seventh aspect of the present invention extracts the image part corresponding to the person to be recognized from a plurality of images, determines the three-dimensional coordinates of the feature point whose position is changed when the person to be recognized bends or extends an arm and the three-dimensional coordinates of the reference point whose position is not changed even if the person to be recognized bends or extends an the arm, and determines either the position or the direction pointed to by the person to be recognized based on the three-dimensional coordinates of the feature point and the three-dimensional coordinates of the reference point. The processing means calculates the distance between the reference point and the feature point, and executes a predetermined processing based on the change in the distance between the reference point and the feature point. For example, the tip of the hand, the finger or the like of the person to be recognized or the point corresponding to the tip or the like of a pointer held by the person to be recognized can be used as the feature point. For example, a point corresponding to the body (such as the chest and the shoulder joint) of the person to be recognized can be used as the reference point.
Thus, if the person to be recognized makes a motion to adjust the direction of the feature point with respect to the reference point so that the direction from the reference point toward the feature point may match the position or direction to be pointed to, the pointed position or direction pointed to is determined by the determining means. If the person to be recognized makes a motion to bend or extend the arm, the distance between the reference point and the feature point is changed, so that a predetermined processing is thus performed based on this change in the distance.
Thus, in the seventh aspect of the present invention, since the position or direction pointed to by the person to be recognized is determined from the positional relationship between the reference point and the feature point, the direction in which the image pickup means picks up the image can be set so that the reference point and the feature point can be reliably detected without taking into account motions such as the raising and lowering of the finger. Furthermore, since whether or not the execution of a predetermined processing is instructed is determined on the basis of the change in the distance (relative position) between the reference point and the feature point, it is unnecessary to detect additional image features in order to determine whether or not the execution of a predetermined processing is being instructed. In addition, the distance between the reference point and the feature point scarcely changes even if a person makes a motion to point to a specific position or a specific direction.
Therefore, according to the seventh aspect of the present invention, it is possible to reliably detect the motion of the person to be recognized to instruct the execution of a predetermined processing (the motion to bend or extend the arm) in a short time. The instruction from the person to be recognized can thus be confirmed in a short time.
The processing means can execute, as a predetermined processing, the processing associated with the position or direction pointed to by the person to be recognized, for example, when the distance between the reference point and the feature point is changed. Since the motion to bend or extend the arm is a very natural motion, if this motion is used to instruct the above-described execution of a predetermined processing, the person to be recognized can make the motion for instructing the execution of a predetermined processing without feeling a sense of uncomfortableness.
Furthermore, the direction of the change in the distance between the reference point and the feature point due to the motion to bend or extend the arm is of two types (a direction of increase in the distance and a direction of reduction in the distance). Thus, when the distance between the reference point and the feature point is increased, a first predetermined processing may be carried out. When the distance between the reference point and the feature point is reduced, a second predetermined processing differing from the first predetermined processing may be carried out.
Thus, when the person to be recognized makes a motion to extend an arm (in this case, the distance between the reference point and the feature point is increased), the first predetermined processing is carried out. When the person to be recognized makes a motion to bend the arm (in this case, the distance between the reference point and the feature point is reduced), the second predetermined processing is carried out. It is therefore possible for the person to be recognized to select the processing to be executed from either the first predetermined processing or and second predetermined processing, similarly to such as left and right clicks of a mouse. The person to be recognized makes either the extending motion or the bending motion, whereby it is possible to reliably execute the processing selected from either the first predetermined processing or second predetermined processing by the person to be recognized.
For the determination of whether or not the execution of a predetermined processing is instructed on the basis of a change in the distance between the reference point and the feature point, more particularly, for example, the magnitudes of the change in the distance between the reference point and the feature point are compared. If the change in the distance is a predetermined value or more, it is possible to determine that the execution of a predetermined processing is instructed. However, if the distance between the reference point and the feature point is considerably changed due to other motions having no intention of the execution of a predetermined processing, then it is possible that the instruction from the person to be recognized may be mistaken.
From this point of view, preferably, the processing means detects the rate of change in the distance between the reference point and the feature point, that is, the velocity of the change, and executes a predetermined processing when the detected velocity of change is a at threshold value or more.
In the seventh aspect of the present invention, the velocity of the change in the distance between the reference point and the feature point is detected, and a predetermined processing is then executed only when the detected velocity of the change is at the threshold value or more. In such a manner, the person to be recognized makes a specific motion to quickly bend or extend on arm, whereby the velocity of the change in the distance between the reference point and the feature point reaches the threshold value or more, so that a predetermined processing is executed. Thus, the rate of recognition of the motion of the person to be recognized for instructing the execution of a predetermined processing is improved. Only when the person to be recognized makes a motion for instructing the execution of a predetermined processing, is this motion reliably detected allowing a predetermined processing to be carried out.
Moreover, as the physique and muscular strength or the like varies depending on the person to be recognized, even if the person to be recognized makes a motion to quickly bend or extend an arm in order to allow the processing means to execute a predetermined processing, the velocity of the change in the distance between the reference point and the feature point varies depending on the individual person to be recognized. Therefore, in some cases, even if the person to be recognized makes a motion to quickly bend or extend an arm in order to instruct the processing means to execute a predetermined processing, this motion cannot be detected. In contrast to this, sometimes this motion is detected by mistake, although the person to be recognized has not made this motion.
Thus, preferably, the seventh aspect of the present invention further comprises threshold value setting means for requesting the person to be recognized to bend or extend the arm and for previously setting the threshold value based on the rate of the change in the distance between the reference point and the feature point when the person to be recognized bends or extends the arm.
In this manner, the threshold value as to whether or not the processing means executes a predetermined processing is previously set based on the rate of the change in the distance between the reference point and the feature point when the person to be recognized bends or extends an arm (quickly bends or extends an arm) in order to allow the processing means to execute a predetermined processing, whereby the threshold value can be obtained in response to the physique, muscular strength, or the like of the individual persons to be recognized. Whether or not the execution of a predetermined processing is instructed is determined by the use of this threshold value, whereby it is possible to reliably detect the motion of the person to be recognized to instruct the execution of a predetermined processing and to execute a predetermined processing, regardless of any variation in physique, muscular strength, or the like, depending on the individual person to be recognized.
Furthermore, the seventh aspect of the present invention further comprises second detecting means for extracting the image part corresponding to the arm of the person to be recognized from the plurality of images and for detecting whether or not the arm of the person to be recognized is lowered, wherein the processing means continues in its current state when the second detecting means detects that the arm of the person to be recognized is lowered. Namely, an execution state is continued when the processing is carried out, while a stop state is continued when the processing is stopped. Thus, since the person to be recognized does not need to keep raising the arm in order to continuously execute a certain processing, the task of the person to be recognized can be reduced.
According to the seventh aspect of the present invention, the position or direction pointed to by the person to be recognized is determined on the basis of the three-dimensional coordinates of the feature point whose position is changed when the person to be recognized bends or extends an arm and on the basis of the three-dimensional coordinates of the reference point whose position is not changed even if the person to be recognized bends and extends an arm, and a predetermined processing is also executed based on the change in the distance between the reference point and the feature point. Thus, the following effect is obtained. Namely, it is possible to reliably detect the motion of the person to be recognized to instruct the execution of a predetermined processing in a short time, and it is also possible to determine the instruction from the person to be recognized in a short time.