1. Field of the Invention
The present invention relates to computer vision systems, and more particularly to a system in which an object is picked-up via an individual video camera, the camera image is analyzed to isolate the part of the image pertaining to the object, and the position and orientation of the object is mapped into a three-dimensional space. A three-dimensional description of the object is stored in memory and used for controlling action in a game program, such as rendering of a corresponding virtual object in a scene of a video display.
2. Background of the Invention
Tracking of moving objects using digital video cameras and processing the video images for producing various displays has been known in the art. One such application, for producing an animated video version of a sporting event, has been disclosed by Segen, U.S. Pat. No. 6,072,504, the disclosure of which is incorporated in the present specification by reference. According to this system, the position of a tennis ball during play is tracked using a plurality of video cameras, and a set of equations relating the three-dimensional points in the court to two-dimensional points (i.e. pixels) of digital images within the field of view of the cameras are employed. Pixel positions of the ball resolved in a given digital image can be related to a specific three-dimensional position of the ball in play and, using triangulation from respective video images, a series of image frames are analyzed by a least-squares method, to fit the positions of the ball to trajectory equations describing unimpeded segments of motion of the ball.
As described in some detail by Segen, once a three-dimensional description of position and motion of an object has been determined, various methods exist which are well known in the art for producing an animated representation thereof using a program which animates appropriate object movement in a video game environment.
Stated otherwise, Segen is concerned with determining the three-dimensional position of an object in motion from a plurality of two-dimensional video images captured at a point in time. Once the three-dimensional position of the “real” object is known, it is then possible to use this information to control a game program in any number of different ways which are generally known to game programmers.
However, the system of Segen relies on a plurality of video cameras for developing positional information about the object based on triangulation. Moreover, the detected object of Segen is a simple sphere which does not require information about the orientation (e.g. inclination) of the object in space. Thus, the system of Segen is not capable of reconstructing position and orientation of an object, whether moving or at rest, from a two-dimensional video image using a single video camera.
It is common for game programs to have virtual objects formed from a combination of three-dimensional geometric shapes, wherein during running of a game program, three-dimensional descriptions (positions and orientations) of the objects relative to each other are determined by control input parameters entered using an input device such as a joystick, game controller or other input device. The three-dimensional position and orientation of the virtual objects are then projected into a two-dimensional display (with background, lighting and shading, texture, and so forth) to create a three-dimensional perspective scene or rendition by means of the rendering processor functions of the game console.
As an example, there can be “virtual object” that forms a moving image in a game display corresponding to how one moves around the “real” object. To display the virtual object, the calculated three-dimensional information is used for fixing the position and orientation of the “virtual object” in a memory space of the game console, and then rendering of the image is performed by known projection processing to convert the three-dimensional information into a realistic perspective display.
However, in spite of the above knowledge and techniques, problems continue to hinder successful object tracking, and a particularly difficult problem is extracting precisely only those pixels of a video image which correspond unambiguously to an object of interest. For example, although movement of an object having one color against a solid background of another color, where the object and background colors vary distinctly from one another, can be accomplished with relative ease, tracking of objects, even if brightly colored, is not so easy in the case of multi-colored or non-static backgrounds. Changes in lighting also dramatically affect the apparent color of the object as seen by the video camera, and thus object tracking methods which rely on detecting a particular colored object are highly susceptible to error or require constant re-calibration as lighting conditions change. The typical home use environment for video game programs demands much greater flexibility and robustness than possible with conventional object tracking computer vision systems.