The present invention relates to a method and apparatus for efficient chroma key-based coding for digital video with an optimized switching threshold.
Object manipulation is one of the desirable features for multimedia applications. This functionality is available in the developing digital video compression standards, such as H.263+ and MPEG-4. For H.263+, refer to ITU-T Study Group 16, Contribution 999, Draft Text of Recommendation H.263 Version 2 ("H.263+") for Decision, September 1997, incorporated herein by reference. For MPEG-4, refer to ISO/IEC 14496-2 Committee Draft (MPEG-4), "Information Technology--Coding of audio-visual objects: visual," October 1997, incorporated herein by reference.
MPEG-4 uses a shape coding tool to process an arbitrarily shaped object known as a Video Object Plane (VOP). With shape coding, shape information, referred to as alpha planes, is obtained. Binary alpha planes are encoded by modified Content-based Arithmetic Encoding (CAE), while grey-scale alpha planes are encoded by a motion compensated Discrete Cosine Transform (DCT), similar to texture coding. An alpha plane is bounded by a rectangle that includes the shape of the VOP. The bounding rectangle of the VOP is extended on the right-bottom side to multiples of 16.times.16 blocks, and the extended alpha samples are set to zero. The extended alpha plane is partitioned into blocks of 16.times.16 samples (e.g., alpha blocks) and the encoding/decoding process is performed on each alpha block.
However, such a shape coding tool is complex and not suitable for use in a low bit rate environment. Specifically, the processing and transmission of explicit alpha plane data consumes computational resources and channel bandwidth.
Accordingly, it would be desirable to provide a system for encoding shape information of VOPs and other video objects that does not require the use of explicit alpha planes.
Chroma key-based coding is a simpler alternative for processing video objects, and requires significantly less processing effort and overhead than shape coding, especially in the encoder. Currently, this particular technique is included in H.263+.
Keying is a process where one video signal is mixed into another to replace part of a frame (e.g., picture) or field with a different image. For example, in a news broadcast, a foreground video object such as an announcer's head and upper body may be overlaid on a background object, such as a neutral-colored backdrop. To achieve the effect that the announcer is actually in front of the backdrop, the displayed image must switch between the foreground object and the background object during each horizontal scan line on a television screen.
Such an effect can be achieved by a switch that wipes from one input to another using one of a variety of different switching patterns. For example, a binary switching pattern may be used, wherein the displayed image abruptly transitions from one image to another in a step change. Alternatively, soft keying techniques may be used, where the switching occurs in a relatively gradual ramp-like manner such that a blended or cross-fade region is created between the two different images.
Soft keying generally provides a more realistic effect than binary switching. However, image features may be attenuated or lost if the blending region is too large.
Moreover, difficulties arise in determining a color difference threshold for switching. If the threshold is too high or too low, switching between the backdrop image and the foreground image will not occur at the proper time.
Accordingly, it would be desirable to provide a chroma keying system that provides an optimized threshold for switching between background and foreground objects in a video picture (e.g., frame).
The chroma keying system should be compatible with existing video standards such as MPEG-4 and H.263+, and other frame-based video compression standards including MPEG-2.
The chroma keying system should be computationally efficient and use minimal overhead.
The chroma keying system should be compatible with VOPs and other video objects and images.
It would further be desirable to provide a chroma key technique to represent the shape of a video object, where the shape information (alpha plane) of a foreground object is embedded in the keyed output, so there is no need to carry an explicit alpha plane, or use alpha plane coding.
The chroma key shape representation technique should provide a smooth transition at the boundary between objects without the need for special switching patterns, such as a general gray scale shape coding tool, or post-processing, e.g., using feathering filters.
Moreover, the chroma key technique should work in conjunction with any frame-based compression standard using a minimal overhead.
The chroma key technique should be compatible with constant or variable rate encoding.
The chroma keying technique should provide an encoder which pre-processes key color data (e.g., prior to encoding) for subsequent use in determining an optimum keying threshold. The chroma keying technique should alternatively provide an encoder which determines an optimum keying threshold in real-time (e.g., as an image is encoded).
The present invention provides a chroma keying system having the above and other advantages.