The invention relates to an apparatus and a concomitant method for shape or object mask coding. More particularly, the invention relates to a method for increasing the efficiency of scalable shape coding by deriving shape information for chrominance components from luminance component.
Shape or object mask encoding (also known as object-oriented image and video coding) has gained acceptance and is presently being promoted to various multimedia standards, e.g., the MPEG-4 (Moving Picture Experts Group) international standard. However, unlike traditional frame encoding methods, each frame or picture is considered as consisting of one or more flexible objects (objects having arbitrary shapes), that may undergo changes such as translations, rotations, scaling, brightness and color variations and the like. Using shape or object mask encoding, functionalities are not only provided at the frame level, but also at the object level.
One functionality is scalability, i.e., providing an image at different spatial resolutions. In general, shape encoding starts by segmenting an image frame into a plurality of objects or video object planes (VOPs), e.g., a speaker in an image is one object and the background is a second object. The resulting xe2x80x9cshape informationxe2x80x9d can be represented as a xe2x80x9cbinary maskxe2x80x9d. The binary shape information provides object mask with only two values: transparent or opaque (where transparent means the corresponding pixel is outside of an object and opaque means the corresponding pixel is within the object). A mask can be broadly defined as the information that defines the shape of an object or pixels associated with an object. More specifically, the shape information or object mask is used to indicate the arbitrary shape of an image or video object and the region in which the texture of this object needs to be coded. Since the object mask is tracked and encoded along with the texture information into the bitstream, it is possible to provide various functionalities based on object. An example of a novel shape or object mask coding is disclosed in the patent application entitled xe2x80x9cMethod And Apparatus For Generic Scalable Shape Codingxe2x80x9d filed on May 17, 1999 with Ser. No. 09/312,797, which is commonly owned by the assignee and is incorporated herein by reference.
Additionally, subband based (e.g., wavelet based coding as discussed below) coding is also supported by the MPEG-4 standard. One of the advantages of wavelet based coding scheme is that it can provide flexible spatial scalability. Thus, the coding standard should support both spatial scalability of texture coding and spatial scalability of shape coding.
Unfortunately, due to the nature of shape-adaptive wavelet transform, the wavelet transforms of texture information within a region are based on the shape information of that region. Thus, in order to perfectly reconstruct the texture information at the decoder end, the shape of the region to be used in the decoder end for inverse wavelet transform must be exactly the same as the shape used in the encoder end for wavelet transform. This is true for each spatial layer for the luminance and chrominance components. Unfortunately, this leads to the requirement that both luminance and chrominance shape information are to be encoded into the bitstream, thereby reducing coding efficiency.
It should be noted that if no spatial scalability of shape coding is required at the decoder end, then one can always start from the full resolution of the shape information for luminance and then derive from that the exact shape for chrominance at whatever spatial level according to the same decomposition order of the encoder. However, when the scalability of shape coding is required, it is desirable to be able to derive the exact shape of the chrominance information from the shape of the luminance at the same spatial level, without having to start from the full resolution of the luminance shape information, which may not be available or may not be computationally practical for a particular application.
Therefore , a need exists in the art for a generic spatially-scalable shape encoding method and apparatus that is capable of deriving the exact shape of the chrominance components from the shape of the luminance component for each spatial layer.
In the present invention, an embodiment of a generic spatially-scalable shape encoding apparatus and concomitant method for deriving the exact shape of the chrominance components from the shape of the luminance component for each spatial layer, is disclosed. The present generic spatially-scalable shape encoding applies a series of subband (e.g., wavelet) filters, T1, T2, . . . TN, to obtain N-levels of wavelet decomposition for both texture and shape information of both luminance and chrominance components.
More specifically, a series of subband filters, T1, T2, . . . TN, are applied to an input image to obtain N-levels of subband decomposition for both texture and shape information of the luminance component. It should be noted that subband filtering comprises two distinct processing steps: spatial filtering and subsampling. In the preferred embodiment, the corresponding subsampling function (T1) of said subband filter T1 is applied to the full resolution luminance shape information or mask to obtain a full resolution chrominance shape information. In turn, the corresponding series of subsampling filters ((T2), . . . (TN)) can be applied to the full resolution chrominance shape information to obtain additional levels of chrominance shape information. Next, the corresponding series of subband filters, T2, . . . TN, and said plurality of levels of chrominance shape information are applied to the full resolution chrominance texture information to obtain a plurality of levels of chrominance texture information.
Finally, the plurality of levels of resolution of luminance texture and shape information are encoded into the bitstream. However, only the plurality of levels of resolution of chrominance texture information are encoded into the bitstream. Namely, coding efficiency is enhanced by not having to encode the plurality of levels of resolution of chrominance shape information into the bitstream, while still providing the desirable feature of scalability.
Additionally, the present invention also discloses a novel method of interpolating a missing chrominance component corresponding to a luminance pixel within an object. Namely, the missing chrominance component is interpolated by taking the average of its three neighboring values in the chrominance plane.