1. The Field of the Invention
The present invention relates to displaying video images generated by a camera on a display, and more particularly to tracking a head portion of a person image in camera-generated video images.
2. The Relevant Art
It is common for personal computers to be equipped with a camera for receiving video images as input. Conventionally, such camera is directed toward a user of the personal computer so as to allow the user to view himself or herself on a display of the personal computer during use. To this end, the user is permitted to view real-time images that can be used for various purposes.
One purpose for use of a personal computer-mounted camera is to display an interaction between camera-generated video images and objects generated by the personal computer and depicted on the associated display. In order to afford this interaction, a current position of the user image must be identified. This includes identifying a current position of the body parts of the user image, including the head. Identification of an exact current location of the user image and his or her body parts is critical for affording accurate and realistic interaction with objects in the virtual computer-generated environment. In particular, it is important to track a head portion of the user image since this specific body part is often the focus of the most attention.
Many difficulties arise, however, during the process of identifying the current position of the head portion of the user image. It is often very difficult to discern the head portion when relying on a single technique. For example, when identifying the location of a head portion using shape, color, motion etc., portions of the background image and the remaining body parts of the user image may be confused with the head. For example, a flesh coloring of a hand may be mistaken for features of the head.
A system, method and article of manufacture are provided for tracking a head portion of a person image in video images. Upon receiving video images, a first head tracking operation is executed for generating a first confidence value. Such first confidence value is representative of a confidence that a head portion of a person image in the video images is correctly located. Also executed is a second head tracking operation for generating a second confidence value representative of a confidence that the head portion of the person image in the video images is correctly located. The first confidence value and the second confidence value are then outputted. Subsequently, the depiction of the head portion of the person image in the video images is based on the first confidence value and the second confidence value.
In one embodiment of the present invention, the first head tracking operation begins with subtracting a background image from the video images in order to extract the person image. Further, a mass-distribution histogram may be generated that represents the extracted person image. A point of separation is then identified between a torso portion of the person image and the head portion of the person image.
Next, the first head tracking operation continues by identifying a top of the head portion of the person image. This may be accomplished by performing a search upwardly from the point of separation between the torso portion and the head portion of the person image. Subsequently, sides of the head portion of the person image are also identified. As an option, the first head tracking operation may track the head portion of the person image in the video images using previous video images including the head portion of the person image.
In one embodiment, the second head tracking operation may begin by identifying an initial location of the head portion of the person image in the video images. Thereafter, a current location of the head portion of the person image may be tracked starting at the initial location. As an option, the initial location of the head portion of the person image may be identified upon each instance that the second confidence value falls below a predetermined amount. By this feature, the tracking is xe2x80x9crestartedxe2x80x9d when the confidence is low that the head is being tracked correctly. This ensures improved accuracy during tracking.
As an option, the initial location of the head portion of the person image may be identified based on the detection of a skin color in the video images. This may be accomplished by extracting a flesh map; filtering the flesh map; identifying distinct regions of flesh color on the flesh map; ranking the regions of flesh color on the flesh map; and selecting at least one of the regions of flesh color as the initial location of the head portion of the person image based on the ranking. During such procedure, holes in the regions of flesh color on the flesh map may be filled. Further, the regions of flesh color on the flesh map may be combined upon meeting a predetermined criteria.
In a similar manner, the current location of the head portion of the person image may be tracked based on the detection of a skin color in the video images. Such technique includes extracting a sub-window of the head portion of the person image in the video images; forming a color model based on the sub-window; searching the video images for a color similar to the color model; and estimating the current location of the head portion of the person image based on the search.
In one embodiment, the module that identifies the initial location of the head portion of the person image and the module that identifies the current location of the head portion of the person image may work together. In particular, while tracking the current location of the head portion of the person image, a flesh map may be obtained. Thereafter, the flesh map may be used during subsequent identification of an initial location of the head portion of the person image when the associated confidence level drops below the predetermined amount.
Similar to using the skin color, the initial location of the head portion of the person image may be also be identified based on the detection of motion in the video images. Such identification is achieved by creating a motion distribution map from the video images; generating a histogram based on the motion distribution map; identifying areas of motion using the histogram; and selecting at least one of the areas of motion as being the initial location of the head portion of the person image.
Similarly, the current location of the head portion of the person image may be tracked based on the detection of motion in the video images. This may be accomplished by determining a search window based on a previous location of the head portion of the person image; creating a motion distribution map within the search window; generating a histogram based on the distribution motion map; identifying areas of motion using the histogram; and selecting at least one of the areas of motion as being the initial location of the head portion of the person image.
These and other aspects and advantages of the present invention will become more apparent when the Description below is read in conjunction with the accompanying Drawings.