Shape or object mask encoding (also known as object-oriented image and video coding) has gained acceptance and is presently being promoted to various multimedia standards, e.g., the MPEG-4 (Moving Picture Experts Group) international standard. However, unlike traditional frame encoding methods, each frame or picture is considered as consisting of one or more flexible objects (objects having arbitrary shapes), that may undergo changes such as translations, rotations, scaling, brightness and color variations and the like. Using shape or object mask encoding, functionalities are not only provided at the frame level, but also at the object level.
One functionality is scalability, i.e., providing an image at different spatial resolutions. In general, shape encoding starts by segmenting an image frame into a plurality of objects or video object planes (VOPs), e.g., a speaker in an image is one object and the background is a second object. The resulting “shape information” can be represented as a “binary mask”. A mask can be broadly defined as the information that defines the shape of an object or pixels associated with an object. Since the object mask is tracked and encoded into the bitstream, it is possible to provide various functionalities based on object.
More specifically, the shape information or object mask is used to indicate the arbitrary shape of an image or video object and the region in which the texture of this object needs to be coded. The binary shape information provides object mask with only two values: transparent or opaque (where transparent means the corresponding pixel is outside of an object and opaque means the corresponding pixel is within the object). FIG. 1(a) shows such an arbitrary shaped object 100 and FIG. 1(b) shows the corresponding binary object mask 110 that identifies the shape and texture region of the object. Although the arbitrary shaped object 100 contains specific texture information, illustrated by the different grayscale shading, such specific texture information is not captured by the object mask. Only the shape information and whether a pixel is within or without an object is provided by the object mask.
When scalability is required, the mask (i.e., shape of the object) is typically decomposed into a plurality of different spatial resolutions (levels of resolution), such that the encoder will also encode the mask at different spatial resolutions into the encoded bitstream. However, since numerous decompositions methods are available, the encoding efficiency of the encoder varies depending on the decomposition method that is employed. Additionally, if the decomposition method is modified, it is often necessary or desirable to also alter the mask encoding method to better match the decomposition method, thereby increasing overall coding efficiency. Unfortunately, such modification to the encoder is costly, complex, and time consuming.
Therefore, a need exists in the art for a generic spatially-scalable shape encoding method and apparatus that is capable of handling different decomposition methods, while maximizing coding efficiency of the encoder.