1. Field of the Invention
The present invention relates to a method and an apparatus for generating information input in which input information is extracted by obtaining a reflected light image of a target object.
2. Description of the Background Art
There are various input devices for computers, and among them, a mouse is widely used as one of the most popular input devices along with a keyboard. However, the mouse can only carry out manipulations such as a moving of a cursor and a selection from a menu, so that the mouse is only capable of playing a role of a two-dimensional pointing device at best. In other words, the mouse can only handle two-dimensional information, and it is difficult to select a thing with a depth aspect such as an object in a three-dimensional space. Also, in a case of producing the animation by using a computer, for example, it is difficult to give a natural motion to a character by means of manipulation information input operations using the input device like mouse.
Also, in the multi-modal field, there is a demand for a scheme that enables to handle a device in a form close to a natural human communication by inputting manipulation information such as a gesture like a hand action or a body motion and a posture, as a complement to the input information given by the input means such as speech input, keyboard, mouse, track ball, etc.
For this reason, in recent years, the three-dimensional pointing device for enabling the recognition of natural human gestures has been developed as one technique for enabling a variety of input operations in the multi-modal field and others by compensating the difficulties associated with the pointing in the three-dimensional space.
For example, there is a proposition of a three-dimensional pointing device as shown in FIG. 105. This device has a ball shaped operation portion in a middle of its body, and ten keys arranged at a peripheral portion. This device has six degrees of freedom corresponding to six different ways for operating the ball shaped operation portion, that is, pushing a front part of it, pushing a central part of it, pushing a rear part of it, pulling it upward, rotating it to the right, and rotating it to the left.
By assigning appropriate roles to these six degrees of freedom, it is possible to control a position (x, y, z) and an orientation (x-axis, y-axis, z-axis) of a cursor in the three-dimensional space, or a position (x, y, z) and an orientation (x-axis, y-axis, z-axis) of a viewpoint with respect to the three-dimensional space.
However, this three-dimensional pointing device requires a considerable level of skills so that when this device is actually operated it is quite difficult to control a cursor or a viewpoint exactly as desired. For example, when one tries to rotate the ball to the left or right, a front part or a rear part of the ball can be pushed at the same time unintentionally, such that a cursor is moved or a viewpoint is shifted to a totally unexpected direction.
As oppose to such a three-dimensional pointing device, there are also input devices that use a hand action or a body motion, known by the names such as a data glove, a data suit, and a cyber-glove. Among them, the data glove is a glove shaped device which has optical fibers on its surface. These optical fibers are provided up to finger joints so as to utilize a change of a light conduction due to a bending of a finger. By measuring an amount of light conduction, it is made possible to determine how much each finger joint is bent. A position of a hand itself in the three-dimensional space is measured by a magnetic sensor provided on the back of the hand.
As a result, when a command corresponding to a specific hand action is determined in advance, such as a pointing by an index finger indicates a forward move, for example, it is possible to realize an operation (called a walk-through) using the data glove that simulates a motion of walking about while variously changing a viewpoint within the three-dimensional space.
However, such a three-dimensional pointing device is associated with the following problems.
First of all, it is very expensive and therefore not suitable for home use.
Secondly, the recognition error is inevitable as an angle of the finger joint is to be measured. For example, suppose that a state of extending only the index finger while the other fingers are turned in is defined as a forward move command. Here, even when the index finger is extended it is rather unlikely for an angle of the second joint of the index finger to become exactly 180.degree., so that unless a margin is provided, it would be impossible to recognize this state except when the finger is completely extended.
Thirdly, the operator is required to wear the data glove so that a natural movement can be obstructed, and also it is necessary to calibrate the light conduction state in a state of opening the hand and a state of closing the hand every time the data glove is worn, so that it is not very convenient to use. Moreover, because of the use of the optical fibers, a problem like a broken fiber occurs during the continuous use so that it is very much an article of consumption.
In addition, despite of the fact that it is a very expensive and not easily handlable device, unless the size of the glove perfectly fits, it is difficult to recognize sophisticated hand actions because the light conduction state tends to deviate from the calibrated state during the use.
Because of such numerous problems associated with it, the data glove has not become as popular as originally expected despite of the fact that it was a device that trigerred the VR (Virtual Reality) technology, and there is no significant reduction of its price so that there are many problems related to its convenience in use.
For this reason, there are some attempts which try to input a hand action or a body motion without requiring the operator to wear a special device such as the data glove. For example, there is a technique for recognizing a shape of the hand by analyzing dynamic images such as video images. However, to this end, there is a need to develop a technique for extracting a target image from the background image. Namely, in a case of recognizing a hand action, it is necessary to extract the hand alone, but this has turned out to be a technically rather difficult thing to do.
For example, consider a case of extracting a hand portion in an image according to the color information. Since the hand is in the flesh color, it is possible to contemplate a scheme for extracting only those pixel portions that have the flesh color as image information. However, it is impossible to distinguish pixels corresponding to the flesh color of the hand portion alone if beige clothes or walls are present in the background. Also, even if it is made possible to distinguish the beige and the flesh color by some adjustments, the color tone will be changed when the lighting is changed, so that it is still difficult to extract the hand portion stably.
In order to resolve these problems, there is a measure for imposing a limitation on the background image such as the placing of a blue mat on the background so as to make the extraction easy. There is also a measure to paint the finger tips with a color that can be easily extracted from the background, or wear a ring in such a color. However, these limitations are not realistic so that they are utilized for experiments but not for practical use.
On the other hand, as another available technique for recognizing the hand action, it is possible to utilize a device for inputting range images called range finder. Typically, the range finder is based on the principle that a spot light or a slit light is irradiated onto a target object and then a distance is determined by the principle of the triangular survey according to a position at which the reflected light is received. This spot light or slit light is mechanically scanned in order to obtain the two-dimensional distance information.
This range finder is capable of generating the range image in very high precision, but the device requires a very large scale configuration and a high cost. Also, the input is very time-consuming and it is difficult to carry out the real time processing.
There are also devices, some of which are already in practical use, for capturing a shape or a motion of the hand or the body by attaching color markers or light emitting elements to the hand or a part of the body, and detecting these color markers or light emitting elements by using the image.
However, the requirement for mounting some element at every occasion of its operation is a great demerit from a viewpoint of the convenience of the user, and can limit its application range significantly. Moreover, as can be seen in the example of the data glove, a device that requires to mount some element on the movable part such as hand tends to have a problem of the durability.
Now, setting aside the input devices as described above, the conventional art of the camera technique will be described.
In the conventional camera technique, in order to realize the chromakey, that is, a character composition with respect to the background, it has been necessary to take an image of the character with the blue back in advance so as to make it easier to extract the character. For this reason, the location for taking images has been limited to a place like studio where it is possible to take images with the blue back. Else, in order to extract the character from the video image taken without using the blue back, it has been necessary to manually edit the character extraction range scene by scene, which is very time-consuming.
Similarly, in a case of generating the character in the three-dimensional space, the conventional camera technique uses a scheme in which a three-dimensional model is produced in advance and then the texture mapping for attaching a picture of the character thereto is carried out. However, the three-dimensional model production and the texture mapping require considerable time and effort so that this scheme has been almost impractical except for some special case where the great expense is permitted such as the movie production.
As described, conventionally, there has been no input device of a direct command type by which the gesture or the motion can be inputted easily. In particular, there has been no device by which the pointing or the viewpoint change in the three-dimensional space can be carried out easily. Also, it has been impossible to give a natural motion to the animation character by using the gesture or the motion of a user directly. In addition, in the conventional camera technique, it has not been possible to extract a specific character alone or input the depth information on the character easily.