The present invention relates to the field of digital video coding technology and, more particularly, to a method and apparatus for providing an improved chroma-key shape representation of video objects of arbitrary shape.
A variety of protocols for communication, storage and retrieval of video images are known. Invariably, the protocols are developed with a particular emphasis on reducing signal bandwidth. With a reduction of signal bandwidth, storage devices are able to store more images and communications systems can send more images at a given communication rate. Reduction in signal bandwidth increases the overall capacity of the system using the signal.
However, bandwidth reduction may be associated with particular disadvantages. For instance, certain known coding systems are lossy because they introduce errors which may affect the perceptual quality of the decoded image. Others may achieve significant bandwidth reduction for certain types of images but may not achieve any bandwidth reduction for others. Accordingly, the selection of coding schemes must be carefully considered.
The Motion Picture Expert Group (MPEG) has successfully introduced two standards for coding of audiovisual information, known by acronyms as MPEG-1 and MPEG-2. MPEG is currently working on a new standard, known as MPEG-4. MPEG-4 video aims at providing standardized core technologies allowing efficient storage, transmission and manipulation of video data in multimedia environments. A detailed proposal for MPEG-4 is set forth in MPEG-4 Video Verification Model (VM) 5.0, hereby incorporated by reference.
MPEG-4 considers a scene to be a composition of video objects. In most applications, each video object represents a semantically meaningful object. Each uncompressed video object is represented as a set of Y, U, and V components (luminance and chrominance values) plus information about its shape, stored frame after frame in predefined temporal intervals. Each video object is separately coded and transmitted with other objects. As described in MPEG-4, a video object plane (VOP) is an occurrence of a video object at a given time. For a video object, two different VOPs represent snap shots of the same video object at two different times. For simplicity we have often used the term video object to refer to its VOP at a specific instant in time.
As an example, FIG. 1(A) illustrates a frame for coding that includes a head and shoulders of a narrator, a logo suspended within the frame and a background. FIGS. 1(B)-1(D) illustrate the frame of FIG. 1 (A) broken into three VOPs. By convention, a background generally is assigned VOPØ. The narrator and logo may be assigned VOP1 and VOP2 respectively. Within each VOP, all image data is coded and decoded identically.
The VOP encoder for MPEG-4 separately codes shape information and texture (luminance and chrominance) information for the video object. The shape information is encoded as an alpha map that indicates whether or not each pixel is part of the video object. The texture information is coded as luminance and chrominance values. Thus, the VOP encoder for MPEG-4 employs explicit shape coding because the shape information is coded separately from the texture information (luminance and chrominance values for each pixel). While an explicit shape coding technique can provide excellent results at high bit rates, explicit shape coding requires additional bandwidth for carrying shape information separate from texture information. Moreover, results are unimpressive for the explicit shape coding at low coding bit rates because significant bandwidth is occupied by explicit shape information, resulting in low quality texture reconstruction for the object.
As an alternative to explicitly coding shape information, implicit shape coding techniques have been proposed in which shape information is not explicitly coded. Rather, in implicit shape coding, the shape of each object can be ascertained based on the texture information. Implicit shape coding techniques provide a simpler design (less complex than explicit technique) and a reasonable performance, particularly at lower bit rates. Implicit shape coding reduces signal bandwidth because shape information is not explicitly transmitted. As a result, implicit shape coding can be particularly important for low bit rate applications, such as mobile and other wireless applications.
However, implicit shape coding generally does not perform as well as explicit shape coding, particularly for more demanding scenes. For example, objects often contain color bleeding artifacts on object edges when using implicit shape coding. Also, it can be difficult to obtain lossless shapes using the implicit techniques because shape coding quality is determined by texture coding quality and is not provided explicitly. Therefore, a need exists for an improved implicit shape coding technique.
The system of the present invention can include an encoding system and a decoding system that overcomes the disadvantages and drawbacks of prior systems.
In one embodiment of the present invention, shape information for an object is implicitly encoded by using a chroma-key color. According to this embodiment, a bounding box is created around the object and the pixels that are in the bounding box but outside the object are identified and replaced with a key color. The object is coded and a first bitstream is output that includes the coded data for the pixels in the bounding box. A scene description bitstream is sent that includes a node containing the key color and chroma-key thresholds for the object. In one embodiment, the node is a MaterialKey node. In a futher embodiment, the node comprises a transparency field, an isKeyed field, an isRGB field, a keyColor field, a lowThreshold field, and a highThreshold field. A decoding system decodes the object and bounding box containing the object, and assigns a value signifying transparent to each pixel for which the difference between the color of each of said decoded pixels and the key color is below or equal to a low threshold, and assigns a value signifying opaque to each pixel for which said difference is greater than a high threshold.