1. Field of the Invention
This invention relates generally to gesture input to computer systems, and more particularly to visually tracking a device capable of being deformed, wherein the deformation triggers an action on the part of the computer system.
2. Description of the Related Art
There has been a great deal of interest in searching for alternatives to input devices for computing systems. Visual gesture input devices are becoming more popular. Generally speaking, gesture input refers to having an electronic device such as a computing system, video game console, smart appliance, etc., react to some gesture captured by a video camera that tracks an object.
Tracking of moving objects using digital video cameras and processing the video images for producing various displays has been known in the art. For example, one such application, for producing an animated video version of a sporting event, has been disclosed by Segen, U.S. Pat. No. 6,072,504. According to this system, the position of a tennis ball during play is tracked using a plurality of video cameras, and a set of equations relating the three-dimensional points in the court to two-dimensional points (i.e. pixels) of digital images within the field of view of the cameras are employed. Pixel positions of the ball resolved in a given digital image can be related to a specific three-dimensional position of the ball in play and, using triangulation from respective video images, a series of image frames are analyzed by a least-squares method, to fit the positions of the ball to trajectory equations describing unimpeded segments of motion of the ball.
As described in some detail by Segen, once a three-dimensional description of position and motion of an object has been determined, various methods exist which are well known in the art for producing an animated representation thereof using a program which animates appropriate object movement in a video game environment. That is, Segen is concerned with determining the three-dimensional position of an object in motion from a plurality of two-dimensional video images captured at a point in time. Once the three-dimensional position of the “real” object is known, it is then possible to use this information to control a game program in any number of different ways which are generally known to game programmers.
However, the system of Segen relies on a plurality of video cameras for developing positional information about the object based on triangulation. Moreover, the detected object of Segen is a simple sphere which does not require information about the orientation (e.g. inclination) of the object in space. Thus, the system of Segen is not capable of reconstructing position and orientation of an object, whether moving or at rest, from a two-dimensional video image using a single video camera.
It is common for game programs to have virtual objects formed from a combination of three-dimensional geometric shapes, wherein during running of a game program, three-dimensional descriptions (positions and orientations) of the objects relative to each other are determined by control input parameters entered using an input device such as a joystick, game controller or other input device. The three-dimensional position and orientation of the virtual objects are then projected into a two-dimensional display (with background, lighting and shading, texture, and so forth) to create a three-dimensional perspective scene or rendition by means of the rendering processor functions of the game console.
As an example, there can be “virtual object” that forms a moving image in a game display corresponding to how one moves around the “real” object. To display the virtual object, the calculated three-dimensional information is used for fixing the position and orientation of the “virtual object” in a memory space of the game console, and then rendering of the image is performed by known processing to convert the three-dimensional information into a realistic perspective display.
However, in spite of the above knowledge and techniques, problems continue to hinder successful object tracking, and a particularly difficult problem is extracting precisely only those pixels of a video image which correspond unambiguously to an object of interest. For example, although movement of an object having one color against a solid background of another color, where the object and background colors vary distinctly from one another, can be accomplished with relative ease, tracking of objects, even if brightly colored, is not so easy in the case of multi-colored or non-static backgrounds. Changes in lighting also dramatically affect the apparent color of the object as seen by the video camera, and thus object tracking methods which rely on detecting a particular colored object are highly susceptible to error or require constant re-calibration as lighting conditions change. The typical home use environment for video game programs demands much greater flexibility and robustness than possible with conventional object tracking computer vision systems.
Thus, an alternative input device must be able to be tracked under the home use environment by a single relatively inexpensive camera in order to become widely accepted. Additionally, the alternative input device must be convenient to use. While a glove worn on the hand of a user, where the glove includes sensors that are tracked by a camera to capture input, has been trialed, users have not embraced the glove. One of the reasons for the lack of enthusiasm for a glove is the inconvenience of having to continually remove and put on the glove.
Thus, there is a need to solve the problems of the prior art to provide an input device capable of being tracked by a single video camera, wherein the input device is convenient for the user.