1. Field of Invention
The present invention relates generally to methods and apparatus for compositing objects from an input image to a destination image. More particularly, the present invention relates to reducing extraneous clutter and holes from the object being composited and, in addition, reducing the effect of shadows emanating from the composited object in the destination image.
2. Background
An important procedure in the field of film and video editing is taking an object in one image, e.g. an input image, and compositing that object onto another image, e.g. a destination image, with minimal distortion of the object. For example, this procedure could be used for taking an object such as a human figure in one video and placing that human figure into another video without distorting or altering the image of the person. One video image may have a person standing or moving around in a typical living room and another video can be of an outdoor scene such as a jungle or desert. The compositing procedure would take the image of the human figure in the living room and place the figure in the other video thereby providing the effect of the human figure standing or moving around in the given outdoor setting.
One well-known and widely used prior art method for compositing objects from a input image to a destination image is from the field of digital effects and chroma-keying. This method is commonly referred to as blue-screening and involves placing a blue or other fixed-color screen behind the object being composited, typically the image of a person (the color blue is a hue that strongly contrasts all colors of human skin). In blue-screening, the system checks to see which pixels in the input image are not blue and labels those pixels as foreground pixels. Normally, the foreground pixels will only be those pixels that are part of the object being composited since there are typically no other objects in the image and the background is solid blue. The system then composites, or blends, the object (i.e. collection of all foreground pixels) onto a destination image. One of the disadvantages of using blue-screening for object compositing is that it requires a fixed color screen behind the object. Another disadvantage is that if any of the colors on the object, such as an item of clothing, is blue, holes will appear in the object in the destination image. This occurs because the pixels in the blue areas on the object will not be labeled as foreground pixels and thus will not be composited with the rest of the object, resulting in the object having holes when composited onto the destination image.
Other prior art background subtraction procedures, from the field of computer vision, are used to eliminate the fixed color screen requirement. One procedure involves building an average background image by taking a predetermined number of sample images of a (multi-colored) background and creating a background model. For each new sample image taken, each pixel in the new image is compared to its corresponding pixel in the background model being formed. This is done to determine whether the pixel in the current sample image is a foreground pixel, i.e. an object pixel. Pixels that are determined to be part of the foreground are then blended or composited onto the destination image. One disadvantage with this procedure is if a foreground pixel happens to match its corresponding background model pixel color, it will not be considered a foreground pixel. This will introduce holes into the composited object. Another disadvantage is that shadows cast by the object often make the object, when composited, appear to have its original form plus extraneous appendages (as a result of the shadows). The procedure mistakenly labels the xe2x80x9cshadowxe2x80x9d pixels as foreground pixels. Yet another disadvantage is if any portion of the background changes or if the camera is moved while the background model is being built, certain portions of the background (e.g. the portions that moved) will be incorrectly labeled as part of the foreground and be composited onto the destination image. Although there are prior art techniques for updating the background model to reflect changes, they cannot account for a constantly changing background such as one that includes a changing television screen or a window looking out onto a busy street.
Another prior art method of compositing objects, taken from the field of computer vision and z-keying, involves the use of stereo cameras. The method involves calculating, or extracting, a depth value for each pixel. Pixels that are closer than a certain depth from the camera are labeled as foreground pixels and eventually composited onto a destination image. The image processing algorithms involved in computing the depth values in real time require immense computation making them impractical to implement on typical home personal computers. In addition, the procedure requires the use of two cameras.
An important sub-function of the broader procedure of compositing objects from an input image to a destination image is reducing the effect of shadows emanating from the object in the input image in the compositing procedure. For example, if the object is a person standing in a room in which the lighting causes the person to cast shadows on the floors or walls around him, the goal is to reduce the effect of the shadow in the destination image (i.e., the shadow should not appear as part of the person in the destination image). The procedure should determine which pixels belong to the actual object and which pixels make up a shadow.
One prior art method of reducing the effect of shadows in the destination image, referred to as intensity-based shadow filtering, involves building an average model of the background image. Once this average background image is entered, the system knows the approximate brightness of each pixel in the background. Thus, if a pixel becomes somewhat darker in the input image, the system assumes that the pixel is now within a shadow. However, if the pixel in the average background image is now xe2x80x9ccoveredxe2x80x9d by the object being composited and also happens to be darker than the xe2x80x9ccoveredxe2x80x9d pixel, the same method will create a hole in the object once composited onto the destination image (the size of the hole depending on the number of pixels that are darker and are part of the object). Thus, the problem with the prior art method of reducing the undesirable effect of shadows in the destination image is that the shadow removal process itself may create more holes in the body of the object being composited.
Therefore, it would be desirable to have a method and apparatus for compositing objects from an input image to a destination image such that the object is composited with the least amount of distortion from shadows or a constantly changing background, and has a reduced number of holes and gaps after composited onto the destination image.
The present invention provides an object compositing system for compositing an object from an input image onto a destination image. In a preferred embodiment, the object is composited from an image having an arbitrary or non-uniform colored background containing some non-static elements onto a destination image with minimum effects from shadows cast by the object and with minimum gaps or holes within the object. Various improvements in the compositing procedure such as shadow reduction and hole filling, and less restrictive requirements regarding the object""s surroundings are described herein.
In a preferred embodiment, an object compositing method of extracting an object from an image model and blending the object onto a destination image is described. A background model is created by examining several frames of an average background image before the object being composited enters the image. A frame of the input image containing the object is obtained after the background image model has been created. An alpha image is created in which each pixel ranges from xe2x80x9c0xe2x80x9d indicating it is not part of the object to xe2x80x9c1xe2x80x9d indicating that it is part of the object. The alpha pixel values are set according to values corresponding to input image pixels and average background pixels. The effect of shadows emanating from the object is reduced so that the composited object in the destination image contains only the object clearly outlined by the object""s physical boundaries without the effect of shadows cast by the object. This is done by comparing the brightness of the input image pixels to the brightness of the average background image pixels. It is then determined whether the input image pixel hue (color) is within a predetermined hue tolerance of a corresponding pixel from the average background image. The type and extent of the pattern surrounding the input image pixel is then calculated and compared to the pattern surrounding its corresponding pixel from the average background image. A set of templates is then derived in which the templates fit within the object. The templates allow holes or gaps in the object created during the compositing process to be filled to a large extent. The templates can be configured to comprise the shape of the object. All alpha pixels of the object falling within any of the templates are switched or kept at a value of xe2x80x9c1xe2x80x9d, ensuring that the pixels are part of the object. The object is blended onto the destination image using the alpha image as a blending coefficient (alpha blend) wherein all input image pixels corresponding to alpha pixels with value one are blended onto the destination image.
In another preferred embodiment the compositing procedure bypasses the shadow reduction routine thereby allowing the object to be composited faster in situations where shadows cast by the object are not likely to effect the outline of the object once composited. In yet another preferred embodiment the compositing procedure bypasses the template creation and fitting routine thereby allowing the object to be composited faster in situations where the object may not be easily amenable to being fitted by a configuration of templates or it is not likely that the object will contain holes or gaps once composited as a result of colors on the object and in the background, for example.
In another aspect of the present invention, a method of reducing the effect of shadows in the input image is described. An input image pixel and a corresponding average background image pixel is retrieved and the brightness of both pixels are compared. It is then determined whether the hue of the input image pixel is within a hue tolerance of the average background image pixel. Another input image pixel close to the first pixel and another average background image pixel close to the first background pixel are retrieved. It is then determined what type of pattern surrounds the first input image pixel and what type of pattern surrounds the first average background pixel by using the second pixels retrieved from the respective images. A pixel rank scheme is then used to compare the two pattern types to determine whether the first input image pixel is part of a shadow. An alpha image pixel corresponding to the first input image pixel is then set accordingly.
In another aspect of the present invention a method of creating a set of templates in which each template fits within the object is described. A histogram is initialized and an alpha pixel from an alpha image is retrieved and its value determined. Each column in the histogram, represented by an index, corresponds to a column of pixels in the alpha image. The values of the histogram indexes are incremented based on the number of alpha pixels with value one falling in a particular index until there are no more alpha image pixels. An index and a value of the index are retrieved, until there are no more indexes in the histogram, and the index is assigned a label based on the index value. Indexes and y-coordinate values are then set to indicate the position of the left, right, top, and bottom boundaries of the object using the alpha image. Indexes are also set to indicate the position of the right and left boundaries of the object""s center (i.e., left of center and right of center boundaries). These indexes and y-coordinate values are then used to compute bounding rectangles for the object. The bounding rectangles are then used to derived a set of templates where each template fits within its corresponding object part.
The advantages of the methods and systems described and claimed are cleaner, well-defined, and complete objects shown on the destination image after being composited. In addition, an object, whether animate or inanimate, is composited from an image with dynamic or changing background