1. Field of the Invention
The present invention relates to computer vision systems, and more particularly to a system in which an object is picked-up via an individual video camera, the camera image is analyzed to isolate the part of the image pertaining to the object, and the position and orientation of the object is mapped into a three-dimensional space. A three-dimensional description of the object is stored in memory and used for controlling action in a game program, such as rendering of a corresponding virtual object in a scene of a video display.
2. Background of the Invention
Tracking of moving objects using digital video cameras and processing the video images for producing various displays has been known in the art. One such application, for producing an animated video version of a sporting event, has been disclosed by Segen, U.S. Pat. No. 6,072,504, the disclosure of which is incorporated in the present specification by reference. According to this system, the position of a tennis ball during play is tracked using a plurality of video cameras, and a set of equations relating the three-dimensional points in the court to two-dimensional points (i.e. pixels) of digital images within the field of view of the cameras are employed. Pixel positions of the ball resolved in a given digital image can be related to a specific three-dimensional position of the ball in play and, using triangulation from respective video images, a series of image frames are analyzed by a least-squares method, to fit the positions of the ball to trajectory equations describing unimpeded segments of motion of the ball.
As described in some detail by Segen, once a three-dimensional description of position and motion of an object has been determined, various methods exist which are well known in the art for producing an animated representation thereof using a program which animates appropriate object movement in a video game environment.
Stated otherwise, Segen is concerned with determining the three-dimensional position of an object in motion from a plurality of two-dimensional video images captured at a point in time. Once the three-dimensional position of the xe2x80x9crealxe2x80x9d object is known, it is then possible to use this information to control a game program in any number of different ways which are generally known to game programmers.
However, the system of Segen relies on a plurality of video cameras for developing positional information about the object based on triangulation. Moreover, the detected object of Segen is a simple sphere which does not require information about the orientation (e.g. inclination) of the object in space. Thus, the system of Segen is not capable of reconstructing position and orientation of an object, whether moving or at rest, from a two-dimensional video image using a single video camera.
It is common for game programs to have virtual objects formed from a combination of three-dimensional geometric shapes, wherein during running of a game program, three-dimensional descriptions (positions and orientations) of the objects relative to each other are determined by control input parameters entered using an input device such as a joystick, game controller or other input device. The three-dimensional position and orientation of the virtual objects are then projected into a two-dimensional display (with background, lighting and shading, texture, and so forth) to create a three-dimensional perspective scene or rendition by means of the rendering processor functions of the game console.
As an example, there can be xe2x80x9cvirtual objectxe2x80x9d that forms a moving image in a game display corresponding to how one moves around the xe2x80x9crealxe2x80x9d object. To display the virtual object, the calculated three-dimensional information is used for fixing the position and orientation of the xe2x80x9cvirtual objectxe2x80x9d in a memory space of the game console, and then rendering of the image is performed by known projection processing to convert the three-dimensional information into a realistic perspective display.
However, in spite of the above knowledge and techniques, problems continue to hinder successful object tracking, and a particularly difficult problem is extracting precisely only those pixels of a video image which correspond unambiguously to an object of interest. For example, although movement of an object having one color against a solid background of another color, where the object and background colors vary distinctly from one another, can be accomplished with relative ease, tracking of objects, even if brightly colored, is not so easy in the case of multi-colored or non-static backgrounds. Changes in lighting also dramatically affect the apparent color of the object as seen by the video camera, and thus object tracking methods which rely on detecting a particular colored object are highly susceptible to error or require constant re-calibration as lighting conditions change. The typical home use environment for video game programs demands much greater flexibility and robustness than possible with conventional object tracking computer vision systems.
It is a principal object of the present invention to provide an object tracking system suitable for video game programs which overcomes the aforementioned disadvantages of the conventional art, enabling tracking of the position and orientation of an object or prop which serves as an input device for effecting action in the game program.
Another object of the invention is to provide alternative interfaces for games, wherein rather than using a conventional joystick, the user can stand in front of a video camera connected to the game console device, and by physically moving or manipulating an object within view of a single camera, cause a corresponding action to occur in the game.
Yet another object of the invention is to provide methods for mapping two-dimensional information of a discriminated pixel group belonging to the manipulated object to a three-dimensional space, to provide a three-dimensional description of the position and orientation of the object in three dimensions from a single video camera image.
Yet a further object of the invention is to provide a three-dimensional description of an object which includes a rotational component of the manipulated object.
A still further object of the invention is to provide techniques for the selection of object colors which maximizes one""s ability to discriminate, on the basis of color transitions, the pixel groups from a video image which belong unambiguously to the manipulated object and which provide the needed information for deriving a description of three-dimensional position and orientation of the object.
The above and other objects, features and advantages of the present invention will become apparent from the following description when taken in conjunction with the accompanying drawings in which preferred embodiments of the present invention are shown by way of illustrative example.