1. The Field of the Invention
The present invention relates to displaying video images generated by a camera on a display, and more particularly to detecting collisions or any other type of interactions between video images generated by a camera and an object depicted on a display.
2. The Relevant Art
It is common for personal computers to be equipped with a camera for receiving video images as input. Conventionally, such camera is directed toward a user of the personal computer so as to allow the user to view himself or herself on a display of the personal computer during use. To this end, the user is permitted to view real-time images that can be used for various purposes.
One purpose for use of a personal computer-mounted camera is to display an interaction between camera-generated video images and objects generated by the personal computer and depicted on the associated display. In order to afford this interaction, multiple operations must be carried out. For example, the user""s position and body parts must be identified. This may be carried out using a xe2x80x9cblues screen.xe2x80x9d
Once the user""s position and body parts are identified, the task of identifying a current position of the user image still remains. This includes identifying a current position of any body parts of the user image. Identification of an exact current location of the user image and his or her body parts is critical for affording accurate and realistic interaction with objects in the virtual computer-generated environment.
Each time the current location of the user image is identified, it is done so with some associated probability of error. In many applications, the ultimately displayed interaction may be improved if such applications were given some indication of a level of certainty that the current location of the user image has been identified correctly.
Many difficulties arise during the process of identifying the current position of the body parts of the user image. It is often very difficult to discern the user image with respect to the background image. While there are many different types of methods for accomplishing this task which have associated benefits, each of such methods exhibit certain drawbacks that can result in errors. These errors are often manifested in the user image being partly transparent or in flawed interaction between the user image and the objects of the virtual environment.
Until now, processes that identify current positions associated with the user image employ only a single strategy. One process focuses on identifying the location of the user image by recognizing body parts. This may be accomplished in various ways. For example, relative shapes and sizes of the body parts of the user image may play a role in recognition. Further, a history of the body parts of the user image may be employed. This strategy, however, often exhibits problems when items in the background image exhibit shapes and sizes similar to the body parts of the user image. Further, the recognition process may be extremely complicated and subject to error when the images of the user are taken from different perspectives or in combination with foreign objects, e.g., hats, etc.
Other processes that identify current positions associated with the user image rely on motion of the various body parts of the user image and motion of the user himself or herself. These methods also exhibit shortcomings. For instance, if items in the background image move for any reason, such motion may be erroneously construed to be associated with the person and therefore result in faulty interaction with the virtual computer-generated environment. Examples of such items in the background image may include a television, door, or any other device that may move for any reason. An example of the foregoing motion detection process may be found in J. K. Aggarwal and Q. Cai. Human Motion Analysis: A Review. IEEE Nonrigid and Articulated Motion Workshop Proceedings, 90-102 (1997).
As such, when used individually, the foregoing processes that identify current positions associated with the user image often result in erroneous results.
A system, method and article of manufacture are provided for detecting collisions or any other type of interactions between video images generated by a camera and an animated object or objects depicted on a display. First, video images generated by a camera are received. Upon receipt, a first collision detection operation is executed for generating a first confidence value representative of a confidence that the received video images have collided with an object depicted on a display. Further executed is a second collision detection operation for generating a second confidence value also representative of a confidence that the received video images have collided with the object depicted on the display.
The first confidence value and the second confidence value may then be made available for use by various applications. As an option, only one of the collision detection operations may be run at a time in place of both being run together. As such, related applications may depict an interaction between the video images and the object depicted on the display based on the first confidence value and/or the second confidence value. As an option, the interaction depicted on the display may include the object reacting to a collision with the video images.
By extracting a confidence value from two types of collision detection operations, an application may utilize such confidence values to determine whether a collision has actually occurred. Further, the application may assume a collision has occurred based on a higher or lower confidence in order to afford a desired level of interaction.
In one embodiment, the first collision detection operation may include a background subtraction operation while the second collision detection operation may include an operation other than a background subtraction operation, e.g., motion-based process.
The first collision detection operation may first include subtracting a background image of the video images in order to extract a person image. Next, body parts of the person image are recognized. A speed and/or a direction of the object depicted on the display is then generated based on a collision between at least one body part of the person image of the video images and the object depicted on the display. This speed and/or direction of the object may also be used by the application for depicting the interaction between the video images and the object depicted on the display.
As an option, the speed may be generated based on an overlap between the body part of the person image of the video images and the object depicted on the display. Further, the direction may be generated based on a relative position between the body part of the person image of the video images and a center of the object depicted on the display.
As mentioned earlier, the first collision detection operation includes recognizing the body parts of the person image. This act may include first identifying a location and a number of person images in the video images. Further, a head, a torso, and limbs of the person image in the video images may be tracked. A head bounding box confidence may also be determined that is associated with a certainty that the head of the person image is correctly identified. It should be noted that the first confidence value may be based at least in part on the head bounding box confidence.
As an option, the location and the number of person images in the video images may be identified using a history of the location and the number of person images in the video images. Also, the location and the number of person images in the video images may be identified using a mass distribution.
The head may be tracked by using a history of the head or a mass distribution similar to that used in the identification of the location and the number of person images in the video images. Further, the torso of the person image in the video images may be. tracked using information relating to the tracking of the head of the person image.
The second collision detection operation may include generating a motion distribution of a person image in the video images by utilizing frame differencing. After the generation of the motion distribution, the motion distribution may be filtered after which a location of a head of the person image in the video images may be estimated using head tracking. A location of a torso of the person image may then be estimated based on the estimated location of the head of the person image in the video images.
The second collision detection operation may also include determining valid ranges of motion based on the estimated location of the head and the estimated location of the torso of the person image in the video images. If any detected motion resides outside of the valid ranges of motion, such motion is eliminated. Similar to the first collision detection operation, a speed and/or a direction of the object depicted on the display may be generated based on a collision between at least one body part of the person image of the video images and the object depicted on the display. Similar to the first collision detection operation, the second collision detection operation also generates a confidence of a head bounding box of the head of the person image, wherein the second confidence value is based at least in part on the head bounding box confidence.