This application is based on Japanese Patent Application No. 10-66382, filed Mar. 17, 1998, the contents of which are incorporated herein by reference.
The present invention relates to an information input apparatus and method for inputting information in a three-dimensional space, and to a recording medium.
As an input device to a computer, a mouse is prevalently used. However, the mouse is used to merely attain roles of a two-dimensional pointing device such as movement of the cursor, selection of a menu, and the like. Since information the mouse can process in two-dimensional information, the mouse can hardly select, e.g., an object with a depth in a three-dimensional space. On the other hand, when the mouse is used to animate a character upon creating an animation, it cannot easily naturally animate the character. In order to compensate for such difficulties in pointing in a three-dimensional space, three-dimensional pointing devices have been developed. For example, a three-dimensional pointing device 150 shown in FIG. 1 allows six ways of operations, i.e., pushing a central round portion forward, pressing the center of that portion, pressing the rear end of that portion, lifting the entire portion upward, turning the entire portion clockwise, and turning the entire portion counterclockwise, and has six degrees of freedom. By assigning these six degrees of freedom to various instructions, the position (x, y, z) and directions (x-, y-, and z-axes) of a cursor in three-dimensional space can be controlled, or the view point position (x, y, z) and directions (x-, y-, and z-axes) with respect to the three-dimensional space can be controlled.
However, when this device is operated actually, the cursor or view point cannot be desirably controlled. For example, when the operator wants to turn the round portion clockwise or counterclockwise, he or she may press its forward or rear end, and the cursor or view point may move in an unexpected direction.
In place of such three-dimensional pointing device, devices that can input instructions using hand or body actions have been developed. Such devices are called, e.g., a data glove, data suit, cyber glove, and the like. For example, the data glove is a glove-like device, and optical fibers run on its surface. Each optical fiber runs to a joint of each finger, and upon bending the finger, the transmission state of light changes. By measuring the transmission state of light, the bent level of the joint of each finger can be detected. The position of the hand itself in the three-dimensional space is measured by a magnetic sensor attached to the back of the hand. If an action is assigned to a given instruction (e.g., if the index finger is pointed up, a forward movement instruction is issued), the operator can walk in the three-dimensional space by variously changing the view point using the data glove (walkthrough).
However, some problems must be solved. Such device is expensive, and can hardly be used for home use. Since the angle of the finger joint is measured, even when, for example, stretching only the index finger and bending other fingers is defined as a forward movement instruction, stretching a finger includes various states. That is, since the second joint of the index finger rarely makes 180°, it is different to recognize the stretched state except for such 180° state of the index finger, unless a given margin is assured. Since the operator must wear the data glove, his or her natural movement is disturbed. Every time the operator wears the data glove, he or she must calibrate the transmission state of light in correspondence with the stretched and bent finger states, resulting in troublesome operations. Since optical fibers are used, failures such as disconnection of fibers may take place after continuous use of the data glove, and the data glove has a durability as low as an expendable. Despite the fact the data glove is such expensive, troublesome device, if the glove size does not just fit with the operator's hand, the input value may deviate from the calibrated value during use due to slippage of the glove, and delicate hand actions can hardly be recognized. Owing to various problems described above, the data glove has not so prevailed contrary to initial expectation although it served as a trigger device of the VR (virtual reality) technology. For this reason, the data glove is still expensive, and has many problems in terms of its use.
By contrast, some studies have been made to input hand and body actions without wearing any special devices such as a data glove. For example, a method of recognizing hand shape by analyzing a moving image such as a video image has been studied.
However, with such method, it is very hard to extract an objective image portion (e.g., in case of hand action recognition, a hand image alone) from the background image. For example, assume that an objective image is extracted using colors. Since the hand has skin color, only a skin color portion may be extracted. However, if a beige clothing article or wall is present as a background, it is hard to recognize skin color. Even when beige is distinguished from skin color by adjustment, if illumination changes, the color tone also changes. Hence, it is difficult to steadily extract a skin color portion.
In order to avoid such problems, a method that facilitates extraction by imposing a constraint on the background image, e.g., by placing a blue mat on the background may be used. Alternatively, a method that colors finger tips to easily extract them from the background or makes the operator wear color rings may be used. However, such constraints are not practical; they are used for experimental purposes but are not put into practical applications.
The above-mentioned video image recognition such as extraction and the like requires a very large computation amount. For this reason, existing personal computers cannot process all video images (as large as 30 images per sec) in real time. Hence, it is hard to attain motion capture by video image processing in real time.
A device called a range finder for inputting a distant image is known. The typical principle of the range finder is to irradiate an object with spot light or slit light and obtain a distant image based on the position where the light reflected by the object is received by the principle of triangulation. The range finder mechanically scans spot light or slit light to obtain two-dimensional distance information. This device can generate a distant image with very high precision, but requires a large-scale arrangement, resulting in high cost. Also, a long input time is required, and it is difficult for this device to process information in real time.
A device for detecting a color marker or light-emitting unit attached to a hand or body portion from an image, and capturing the shape, motion, and the like of the hand or body portion may be used, and has already been put into some applications. However, the device has a serious demerit of user's inconvenience, since the user must wear the device upon every operation, and the application range is limited very much. As in the example of the data glove, when the user wears the device on his or her movable portion such as a hand, the durability problem is often posed.
The problems in a conventional camera technique will be explained below in addition to the aforementioned input devices. With the conventional camera technique, in order to synthesize (chromakey) a character with a background, a character image must be photographed in front of a blue back to facilitate character extraction. For this reason, the photographing place is limited to, e.g., a studio that can photograph an image in front of a blue back. Alternatively, in order to extract a character from an image photographed in a non-blue back state, the character extraction range must be manually edited in units of frames, resulting in very cumbersome operations.
Similarly, in order to generate a character in a three-dimensional space, a three-dimensional model is created in advance, and a photograph of the character is pasted to the model (texture mapping). However, creation of a three-dimensional model and texture mapping are tedious operations and are rarely used other than applications such as movie production that justifies extravagant cost needed.
In order to solve these problems, for example, a technique disclosed in U.S. Ser. No. 08/953,667 (now U.S. Pat. No. 6,144,366) is known. This technique acquires a distant image by extracting a reflected light image. However, this technique cannot obtain hue information of an object since it extracts the reflected light image. For this reason, two different types of cameras, i.e., a conventional imaging camera and a camera for extracting a reflected light image, are required.