Major discontinuities in shapes are generally represented in the form of shape images, each of which is an image indicating the amount of the foreground and background defined by the major discontinuities. This is also called a matte or soft segmentation image, and is frequently used to define the amount of foreground at a particular pixel location in blue screen techniques. It can be an 8-bit image with a value ranging from 0 to 255, which indicates the soft membership or opacity of this pixel, with 0 having no contribution (i.e. transparent) and 255 having full contribution (i.e. completely seen). If only the geometric shape of the object is needed, then the shape image can be simplified to a binary image with pixels assuming a value of 0 or 1, which is also referred to as binary shapes/binary alpha plane. The latter is of interest in data compression when the available bit rate is limited.
To date, there is no shape codec specifically tailored for coding matte and soft segmentation images. They are generally treated as grey-scaled images and are coded using image compression algorithms. Such an approach is unable to utilize the structural redundancies of the matte and soft segmentation images in compression. For the binary shapes, there are two state-of-the-art coding approaches, i.e., the contour-based method and block-based methods. In contour-based methods, the contour of the shape is first traced clockwise (or counterclockwise) and segmented into multiple line pieces as the smallest processing unit. Encoding and decoding processes are applied sequentially to each unit such that a contour is formed. This is followed by a filling process to reconstruct original shape information. However, a major short-coming of the contour-based methods is that they require substantial pre-processing; however, their compression ratio is lower than that of block-based methods in their lossless mode. As a result, block-based methods are more popular.
In block-based approaches, the binary shape is bounded by rectangles of the same size that include the shapes of the video object plane (VOP), a.k.a. the bounding box. Such rectangles will later be divided into regular macroblocks, a.k.a. micro-processing units, in each of which the alpha values of the pixels are encoded/decoded using entropy coding methods. However, a major limitation of block-based approaches is that blocks with same size are employed and they are aligned in the same direction. Consequently, some of the blocks may inevitably contain no information concerning the contour, but they still consume storage space (redundant blocks). These redundancies generally limit the compression ratio of the block-based methods, especially in high resolution images.