In various application, a video signal is transmitted in a digital form. In many such applications the available bandwidth is limited, so some form of compression is required. In response to this requirement, various video compression standards or processes have been established, including MPEG-1, MPEG-2, H.26X and MPEG-4. For example, the MPEG-4 video compression process extends conventional block-based video codecs (e.g., H.263 and MPEG-2) to object-based coding of segmented video objects (VOs).
Such standard techniques include the transmission of still images. One of the most frequently used coding techniques for compression of a image is a DCT-based block transform coding, which converts a block of digital image data (for example a block of 8×8 pixels) into a set of transform coefficients. Thus, the original image is converted into such blocks, and each block is subject to the DCT transform, and the transformed block is compressed (normally by a process which involves quantization of each transformed coefficient). The DCT provides a good compromise between the energy packing ability and computational complexity.
The DCT transform matrix can be written as:C=[c(p,q)]N×N with:
                              c          ⁡                      (                          p              ,              q                        )                          =                                            2              N                                ⁢                      c            p                    ⁢          cos          ⁢                                          ⁢                                                    p                (                                                      2                    ⁢                    q                                    +                  1                                )                            ⁢              π                                      2              ⁢              N                                                          (        1        )            where N is the number of pixels along each side of the N×N block, and p,q=0, . . . , N−1. Here, cp=√{square root over (½)} if p=0 and cp=1 otherwise.
One advantageous feature of the MPEG-4 video compression standard is the ability to encode arbitrarily-shaped video objects (VOs). The separation of video contents into segments or objects has become an emerging key technique for improving the video quality in (very) low bit-rate applications. By transmission of object or segment contour information, annoying coding errors inherent of block-oriented hybrid DPCM/DCT coding schemes (DPCM stands for Differential Pulse Coded Modulation. In most video coding standard, the DC component of a transformed block is usually coded with a DPCM technique), such as mosquito noise and blocking artefacts, can be avoided to a certain extent, or sometimes to a great extent.
An efficient DCT is desirable for coding objects (note that the term “object” is used herein to refer also to a single segment of a larger object, e.g. a segment of an image) of arbitrary shapes. Generally, an object may contain a number of complete blocks (which may be coded using conventional DCT) and a number of blocks consisting of both pixels which are in the object and pixels which are not. The latter type of blocks often provide the boundary for the object, and accordingly they are sometimes referred to as boundary blocks. The object pixels in the boundary blocks may be at any positions within the blocks, and thus the set of object pixels may be of any shape. Computationally complex shape-adaptive DCT (SA-DCT) algorithms have been proposed in the literature, either based on the calculation of shape-adaptive orthogonal sets of DCT basis functions, for example see U.S. Pat. No. 5,666,212, or based on a DCT coefficient zeroing (H. H. Chen, M. R. Civanlar, and B. G. Haskell, “A block transform coder for arbitrarily-shaped image segments”, Proc. Int. Conf. on Image Processing (ICIP), vol. 1, 1994, pp. 85-89). While the former method relies on expensive calculation of DCT basis functions for every different segment shape, the latter one employs the normal N×N DCT (N=8 typically) with additional numerical optimisation so as to minimise the number of DCT coefficients to be coded. A more efficient algorithm is proposed in (T. Sikora, “Low complexity shape-adaptive DCT for coding of arbitrarily shaped image segments”, Signal Processing: Image Communication, vol. 7, no. 4-6, Nov. 1995, pp. 381-395; and P. Kauff and K. Schuur, “Shape-adaptive DCT with block-based DC separation and Δ-DC correction”). This algorithm is here denoted as the standard SA-DCT hereafter, based on some pre-defined sets of separable DCT basis functions. A given row (or column) of the block is transformed using a DCT matrix corresponding generally to equation (1) but in which the order of the DCT matrix is not N, but instead is a value K, which is the number of object pixels in the row (or column). In other words, a different transformation matrix is used for each value of K. Standard SA-DCT can be viewed as an approximation of the method outlined in see U.S. Pat. No. 5,666,212. One important feature of the standard SA-DCT is that it results in exactly the same number of non-zero transformed coefficients as the number of pixels within the original input data block (of an arbitrary shape). Furthermore, after SA-DCT, the coefficients are generally located at the low frequency corner, and are thus quite desirable for the subsequent processing such as zig-zag scan and non-uniform quantization. Note that there are some DCT-domain positions undefined by the standard SA-DCT. A modified shape-adaptive zig-zag scan method which skips these un-defined positions (if necessary) is adopted in the framework of the MPEG-4 verification model to increase the coding efficiency.
A disadvantage of standard SA-DCT is that since standard SA-DCT employs not just an 8×8 DCT matrix, but also DCT matrices for each value of K, it cannot be implemented using known chipsets which implement 8×8 DCT in a highly efficient manner.
It is clear that to employ eqn. (1) directly, an N×N image block has to be fully defined before the transform can take place. However, since for all boundary blocks, only part of an N×N block belongs to the object, some kind of padding has to be performed so as to pad an arbitrary shape back to the normal block of size N×N.
Intuitively, the simplest padding technique is perhaps to repeat the boundary pixels to fill all undefined positions. In the MPEG-4 video standards, a sophisticated padding scheme has been developed. This scheme is basically a low-pass extrapolation (LPE), performed in three steps as summarized in T. Ebrahimi, “MPEG-4 video verification model version 11.0”, ISO/IEC JTC1/SC29WG11, MPEG98/N2172, March 1998, Tokyo:    1. Calculate the arithmetic mean value m of all block pixels situated within the object region.    2. Assign m to each block pixel outside the object region.    3. Apply the following transform to each block pixel outside the object region in a recursive manner, starting from the top-left of the block and proceeding row by row to the bottom.
                                          f            ~                    ⁡                      (                          i              ,              j                        )                          =                                            f              ⁡                              (                                                      i                    -                    1                                    ,                  j                                )                                      +                          f              ⁡                              (                                  i                  ,                                      j                    -                    1                                                  )                                      +                          f              ⁡                              (                                  i                  ,                                      j                    +                    1                                                  )                                      +                          f              ⁡                              (                                                      i                    +                    1                                    ,                  j                                )                                              2                                    (        2        )            
In Step 3, if one or more pixels used for filtering are outside the block, the corresponding pixels are not included into the filtering operation and the divisor 4 is reduced accordingly.
Another padding method has recently been proposed in J.-W. Yi, S.-J. Cho, W.-J. Kim, S.-D. Kim, and S.-J. Lee, “A new coding algorithm for arbitrarily shaped image segments”, Signal Processing: Image Communication, vol. 12, no. 3, June, 1998, pp. 231-242. It is based on an extension-interpolation (EI) scheme and is thus denoted here as the EI method.
The idea is as follows: (1) A K-point DCT is done for each column or row (of length K) in a boundary block, (2) N−K zeros are added to the rear of the DCT coefficient vector, and (3) an N-point IDCT is performed on the new transformed coefficient vector.
In practice, these three steps can be implemented together via a multiplication matrix of dimension N−K in the pixel domain, as derived in J.-W. Yi, S.-J. Cho, W.-J. Kim, S.-D. Kim, and S.-J. Lee, “A new coding algorithm for arbitrarily shaped image segments”, Signal Processing: Image Communication, vol. 12, no. 3, June, 1998, pp. 231-242.
Since human visual system is not as sensitive to high frequency details as to low frequency components, it is known to increase the quantization step-size from low frequency positions to high frequency positions.
Unlike SA-DCT, the LPE scheme adopted in MPEG-4 will always produce N2 DCT coefficients, which are generally non-zero before quantization, for each padded data block (of size N×N). Similarly, the number of non-zero transformed coefficients generated in the EI method will also be greater than the number of pixels within the original pixel block. This is undesirable: the data set has been expanded after DCT, leading to large reconstruction errors if big quantization steps are used to force some of the high frequency components to be zero, or high bit rate if fine quantization steps are used to keep the reconstruction error small.