This application is based on Japanese Patent Application No. 10-31659, filed Feb. 23, 1998, the contents of which are incorporated herein by reference.
The present invention relates to an information input apparatus which attains pointing in a three-dimensional space using an image.
As an input device to a computer, especially, a pointing input device, a mouse is prevalently used, since most computers equip it. However, the mouse is used to merely attain roles of a two-dimensional pointing device such as movement of the cursor, selection of a menu, and the like.
Since information the mouse can process is two-dimensional information, the mouse can hardly select, e.g., an object with a depth in a three-dimensional space. On the other hand, when the mouse is used for animating a character upon creating an animation, it cannot easily naturally animate the character.
In order to compensate for such difficulties in pointing in a three-dimensional space, various three-dimensional pointing devices have been developed.
As a typical three-dimensional pointing device, for example, a device shown in FIG. 1 is known.
This three-dimensional pointing device allows six ways of operations, i.e., "pushing a central round control knob 150 forward", "pressing the center of the knob 150", "pressing the rear end of the knob 150", "lifting the entire knob upward", "turning the entire knob 150 clockwise", and "turning the entire knob 150 counterclockwise", and has six degrees of freedom.
By assigning these six degrees of freedom to various operation instructions, the position (x, y, z) and directions (x-, y-, and z-axes) of a cursor in a three-dimensional space can be controlled, or the view point position (x, y, z) and directions (x-, y-, and z-axes) with respect to the three-dimensional space can be controlled.
However, when this device is operated actually, the cursor or view point cannot be desirably controlled.
For example, when the operator wants to turn the knob clockwise or counterclockwise, he or she may press its forward or rear end, and the cursor or view point may move in an unexpected direction.
In place of such three-dimensional pointing device, devices that can input instructions using hand or body actions have been developed.
Such devices are called, e.g., a data glove, data suit, cyber glove, and the like. For example, the data glove is a glove-like device, and optical fibers run on its surface. Each optical fiber runs to a joint of each finger, and upon bending the finger, the transmission state of light changes. By measuring the transmission state of light, the bent level of the joint of each finger can be detected. The position of the hand itself in the three-dimensional space is measured by a magnetic sensor attached to the back of the hand. If an action is assigned to a given instruction (e.g., if the index finger is pointed up, a forward movement instruction is issued), the operator can walk in the three-dimensional space by variously changing the view point using the data glove (walkthrough).
However, such device suffers some problems.
First, such device is expensive, and can hardly be used for home use.
Second, operation may often be erroneously recognized. Since the angle of the finger joint is measured, even when, for example, a state wherein the operator stretches only his or her index finger and bends other fingers is defined as a forward movement instruction, such state may be erroneously recognized as another instruction. More specifically, stretching a finger includes various states. That is, since the second joint of the index finger rarely makes 180.degree., it is different to recognize the stretched state except for such 180.degree. state of the index finger, unless a given margin is assured.
Third, since the operator must wear the data glove, his or her natural movement is disturbed.
Fourth, every time the operator wears the data glove, he or she must calibrate the transmission state of light in correspondence with the stretched and bent finger states, resulting in troublesome operations.
Fifth, a problem of failures remains unsolved. That is, after continuous use of the data glove, failures such as disconnection of fibers may take place, and the data glove has a durability as low as an expendable.
Sixth, despite the fact the data glove is such expensive, troublesome device, if the glove size does not just fit with the operator's hand, the input value may deviate from the calibrated value during use due to slippage of the glove, and delicate hand actions can hardly be recognized.
Owing to various problems described above, the data glove has not so prevailed contrary to initial expectation although it served as a trigger device of the VR (virtual reality) technology. For this reason, the data glove is still expensive, and has many problems in terms of its use.
By contrast, some studies have been made to input hand and body actions without wearing any special devices such as a data glove.
As a typical study for inputting hand or body actions, for example, a method of recognizing hand shape by analyzing a moving image such as a video image is known.
However, in this method, an objective image (in case of hand action recognition, a hand image alone) must be extracted from the background image, but it is very hard to extract the objective image portion.
For example, assume that a "hand" as an objective image is extracted using colors. Since the hand has skin color, only a skin color portion may be extracted. However, if a beige clothing article or wall is present as a background, it is hard to recognize skin color, and such method is far from reality. Even when beige is distinguished from skin color by adjustment, if illumination changes, the color tone also changes. Hence, it is difficult to steadily extract a skin color portion.
In order to avoid such problems, a method that facilitates extraction by imposing a constraint on the background image, e.g., by placing a blue mat on the background may be used. Alternatively, a method that colors finger tips to easily extract them from the background or makes the operator wear color rings may be used. However, such constraints are not practical; they are used for experimental purposes but are not put into practical applications.
The above-mentioned video image recognition such as extraction and the like requires a vary large computation amount. For this reason, existing personal computers cannot process all video images (as large as 30 images per sec) in real time. Hence, it is hard to attain motion capture by video image processing in real time.
As another method of inputting hand or body actions by analyzing a moving image such as a video image, a method using a device called a range finder for inputting a distant image is known.
The typical principle of the range finder is to irradiate an object with spot light or slit light and obtain a distant image based on the position where the light reflected by the object is received by the principle of triangulation. The range finder mechanically scans spot light or slit light to obtain two-dimensional distance information. This device can generate a distant image with very high precision, but requires a large-scale arrangement, resulting in high cost. Also, a long input time is required, and it is difficult for this device to process information in real time.
As still another method of inputting hand or body actions by analyzing a moving image such as a video image, a device for detecting a color marker or light-emitting unit attached to a hand or body portion from an image, and capturing the shape, motion, and the like of the hand or body portion may be used. This device has already been put into some applications. However, the device has a serious demerit of user's inconvenience, since the user must wear the device upon every operation, and the application range is limited very much. As in the example of the data glove, when the user wears the device on his or her movable portion such as a hand, the durability problem is often posed.
As described above, various three-dimensional pointing device systems are available. However, a promising system in the future is presumably the one that analyzes and uses a moving image such as a video image without forcing the operator to wear any device or to operate any device directly.
With a conventional camera technique, in order to synthesize (chromakey) a character with a background, a character image must be photographed in front of a blue back to facilitate character extraction. For this reason, the photographing place is limited to, e.g., a studio that can photograph an image in front of a blue back. Alternatively, in order to extract a character from an image photographed in a non-blue back state, the character extraction range must be manually edited in units of frames, resulting in very cumbersome operations.
Similarly, in order to generate a character in a three-dimensional space, a three-dimensional model is created in advance, and a photograph of the character is pasted to the model (texture mapping). However, creation of a three-dimensional model and texture mapping are tedious operations and are rarely used other than applications such as movie production that justifies extravagant cost needed.
In order to solve these problems, for example, a technique disclosed in U.S. Ser. No. 08/953,667 is known. This technique acquires a distant image by extracting a reflected light image. However, this technique cannot use commercially available sensor arrays.
As described above, in recent years, needs and requirements for three-dimensional inputs are increasing, but no direct-pointing input apparatuses that can easily input a gesture or motion without making the user wear any special devices are available.
Hence, development of a practical, simple three-dimensional input apparatus which can easily attain pointing or a change in view point in a three-dimensional space has been demanded.