The present invention relates to digital video and, more particularly, to a motion compensation system for wavelet compressed digital video.
To make digital video transmission and storage practical the quantities of data required to describe the sequence of video frames must be reduced. Digital video compression is dominated by hybrid coding techniques where the data describing an image within a frame of the video sequence is typically transform coded and the images of certain frames are predicted from other frames of the sequence. Neighboring pixels of natural images tend to be highly correlated or spatially redundant. Transform coding decorrelates the image pixels facilitating a reduction in the spatial redundancy of the image data. In addition, there is often very little difference between the images in adjacent frames of the video sequence or substantial temporal redundancy in the sequence. Temporal redundancy is typically reduced by predicting the images in certain frames of the video sequence from the motion-compensated image of a reference frame. Interframe predictive coding generally comprises motion estimation to determine the displacement of image content between frames followed by motion compensation to isolate the part of the image content of a “current” frame that differs from that of the displaced image of the reference frame. Only the content differences or residual and the motion vectors describing the content displacement is coded, transmitted, and stored. At the decoder, the predicted (“current”) frame is reconstructed by displacing the image content of the decoded reference frame as described by the motion vectors and adding the residual.
Block-based, transform coding has been used extensively for coding still images and for intraframe coding of video and is specified by a number of digital image coding standards. For example, block-based, transform coding utilizing the Discrete Cosine Transform (DCT) is the underlying mechanism of the JPEG (IS 10918) still image compression standard and the intraframe coding process of the MPEG-2 (ISO/IEC 13818) video compression standard. For block-based intraframe coding methods, the image is divided into a plurality of contiguous pixel blocks and the transformation method is applied to the pixels on a block-by-block basis.
For block-based video compression, the interframe motion of the image content is typically estimated by a block matching process. The compressed image is expanded in the encoder to reconstruct the image as it would appear at a decoder. A block of pixels (search block) from the current frame is isolated and compared to arrays of pixels of a reference frame in a search range around the spatial location of the search block in the current frame. The block of reference frame pixels that best matches the search block is typically determined by either a cross-correlation function or by minimization of an error criterion. When the block of pixels of the reference frame that best matches the search block is identified, a motion vector representing the motion of the pixels of the search block between its position in the current frame and the position of the best matching block in the reference frame is determined.
While block based compression underlies a number of successful still image and video compression standards, the process has limitations. Images compressed with block-based compression are vulnerable to compression artifacts, particularly at high compression ratios. The most common artifact is the blocking effect where the pixel blocks used for transformation are visible in the reconstructed image. In addition, the reception bandwidth of data processing networks, such as the Internet, is often heterogeneous. For example, one receiver may have a 10 Mbps. Ethernet connection to the network, another a 1.5 Mbps. T1 connection, and another a 54 Kbps. modem connection. In a network characterized by heterogeneous reception bandwidth, a scalable bitstream is highly desirable to enable the production of the highest quality images and video at each receiver of the system. However, block based compression is not well suited to encoding as a scalable bitstream. The desire for scalable transmission and higher quality images has motivated interest in wavelet transform based image compression methods.
Wavelet transform based video compression is a hybrid compression technique that can produce a multi-resolution representation of the video frames that is well suited to a scalable transmission. Intraframe compression is accomplished by quantization of a set of wavelet transforms representing the rectangular array of pixels comprising an image. Typically, wavelet transformation is applied serially to each of the horizontal rows of pixels and then applied serially to each of the vertical columns of pixels making up the image. Referring to FIG. 1, a basic wavelet transformation unit 10 comprises generally a low-pass 12 and a high-pass 14 analysis filter and a down-sampler 16 and 18 for the output of each of the filters.
While wavelet-based image compression has a number of advantages over block-based image compression, it is not well suited to the use of the block matching technique for interframe motion estimation and compensation. If the image is reconstructed from the set of wavelet transform coefficients and block matching is performed on the pixels of the reconstructed image, the desirable scalable nature of the data stream is sacrificed and the coding efficiency is severely impacted because the reference frame, at full resolution, is required to decode the motion compensated predicted frame.
On the other hand, if block matching is applied to the set of wavelet transforms representing the image, the scalable nature of the compressed bit stream is preserved but phase uncertainty resulting from the wavelet transformation substantially reduces coding efficiency. FIG. 2 schematically illustrates the wavelet transformation of a horizontal row of pixels from a current frame 20 and a reference frame 22. The intensity of each pixel is represented as a function of the intensity (Y) for a pixel at an index location (0-5). For example, the intensities of the pixels Y(0) are the same in the current and reference frames. In fact, the pixels of the two rows 20 and 22 are identical, except that the pixels of the reference frame 22 are shifted one pixel or index position to the right of the corresponding pixel in the current frame 20. Following filtration by the low-pass 12 and high-pass 14 analysis filters of the transformation unit 10, the pixels of each row are represented by two sets of filter coefficients, a low-pass sub-band and a high-pass sub-band. For example, the row of pixels 20 of the current frame is represented by low-pass filter coefficients 24 and high-pass filter coefficients 26. The filter coefficients for the current and reference frames reflect the translation of pixels between the two frames. The wavelet transformation is completed by down-sampling the filter coefficients 28. In the downsampling operation 28 every other filter coefficient is decimated. Typically, the odd-indexed coefficients of the low-pass sub-band 24 and the even-indexed coefficients of the high-pass sub-band 26 are decimated to create a complete set of low-pass 30 and high-pass 32 wavelet transform coefficients representing the pixels of the image. While the image can be reconstructed from the set of wavelet transform coefficients, the decimation of the filter coefficients creates a phase uncertainty in the sets of wavelet transform coefficients. As illustrated in FIG. 2, translation of image pixels shifts the phase of the wavelet transform coefficients with the result that two sets of wavelet transform coefficients 34 and 36 (indicated by brackets) representing two nearly identical rows of pixels 20 and 22 include no common coefficients. The translation of the pixels of the two exemplary rows cannot be detected by matching blocks of corresponding transform coefficients. When applied to images, wavelet transform coding is applied to the horizontal rows and columns of image pixels producing phase uncertainty along each axis of the image. As a result, the accuracy of motion estimation and the interframe coding efficiency is substantially comprised when block matching is performed in the wavelet transform domain.
What is desired, therefore, is a motion estimation system that provides efficient interframe coding and preserves the scalable nature of wavelet transform coding when encoding digital video.