A discrete wavelet transform (DWT) provides a multi-resolution representation of a signal. The DWT can be used for a variety of applications such as denoising, restoration and enhancement. The DWT can also be used for compression, particularly the compression of video signals. The DWT for video compression achieves high compression efficiency, and also enable a scalable representation of the video in spatial resolution, temporal resolution and quality, i.e., signal-to-noise ratio (SNR).
One of the most successful applications of the DWT for image compression is the JPEG 2000 compression standard, see “ISO/IEC 15444-1:2000: Information technology—JPEG 2000 Image Coding System—Part 1: Core Coding System,” 2000.
As shown in FIG. 1, the encoding system includes a forward DWT 110, a quantization encoder 120, and an entropy encoder 130 to compress an input image 101 into an output bitstream 102. These operations are performed in reverse in a decoder. This image encoding system achieves both spatial and SNR scalability.
According to the JPEG 2000 standard, the transform can be irreversible or reversible. The default irreversible filter is a Daubechies 9/7 filter described by Antonini, et al. in “Image coding using the wavelet transform,” IEEE Trans. Image Processing,” April 1992, while the default reversible filter is a Le Gall 5/3 filter described by Le Gall et al., in “Subband coding of digital images using symmetric short kernel filters and arithmetic coding techniques,” Proc. IEEE Int'l Conf. Acoustics, Speech and Signal Processing, 1988.
During encoding, an image is decomposed into rectangular tiles, where each tile-component is a basic unit of the original or reconstructed image. The DWT is applied to each tile, and the tile is decomposed into multiple resolution levels. The resolution levels are made up of subbands of coefficients that describe the frequency characteristics of the tile components. The subband coefficients are quantized and collected into rectangular arrays of code blocks, which are then entropy coded with a bit plane coding technique.
For video compression, there exists several codecs that use the DWT. The prior art codecs can be classified into two distinct categories: a first that is strictly transform-based and does not use any motion compensation techniques, and a second that attempts to exploit temporal redundancy in the video using motion compensation. Both methods are scalable in both the spatial and temporal domains, as well as SNR scalable.
An example of the transform-based wavelet video codec is described by Kim et al., in “Low Bit-Rate Scalable Video Coding with 3D Set Partitioning in Hierarchical Trees (3D SPIHT),” IEEE Trans. Circuits and Systems for Video Technology, December 2000. That method is referred to as 3D-SPIHT and applies a separable 1D wavelet transform in each dimension to obtain the 3D subband decomposition. That encoding technique has the following properties: (1) partially ordering of the magnitudes of the 3D wavelet-transformed video with a 3D set partitioning algorithm, (2) ordering of the bit-planes of refinement bits for transmission, and (3) exploiting self-similarity across spatio-temporal orientation trees.
While the 3D-SPIHT codec is free from the computational burden of motion compensation, it has a fundamental problem in that different orientations are mixed. To explain this mixing problem, consider applying the DWT to a 2D signal such as an image, where we apply the 1D transform in both the horizontal and vertical directions. There are three wavelets associated with this 2D transform.
FIGS. 2A-2C show respectively their impulse responses. The wavelet in FIG. 2C does not have a dominant direction. This checkerboard artifact indicates that the 2D DWT is poor at isolating diagonal orientations.
For 3D signals such as video, this problem becomes much worse because the third dimension is time, and the mixing of different motion orientations is a much more severe issue that leads to significant inefficiencies for coding.
An example of a motion compensating wavelet video codec is described by Hsiang et al., in “Embedded video coding using invertible motion compensated 3-D subband/wavelet filter bank,” Signal Processing: Image Communications, May 2001. That method is referred to as MC-EZBC, which stands for motion compensation with embedded zero-tree block coding.
FIG. 3 shows a block diagram of that codec. An input video is first subject to a motion compensated temporal filter (MCTF) 310 that filters the video in the temporal direction using motion vectors computed by a motion estimation unit 320. The filter 310 is half-pixel accurate with perfect reconstruction, which allows for higher coding gains compared to the full-pixel temporal filters.
The output of the MCTF is subject to a spatial analysis 330 to complete the 3D decomposition. The resulting 3D subband coefficients are entropy encoded 340. The motion vectors are also entropy coded by a MV encoder 350, which utilizes traditional prediction and entropy coding techniques. The output of the encoders 340 and 350 can be buffered 360 before an output bitstream 302 is produced.
Because the above codec performs motion estimation locally at the block level and applies the temporal filtering accordingly, the problem of mixing different motion orientations is less of an issue than in the 3D-SPIHT codec. However, a 2D spatial analysis filter is still used. Therefore this codec is still susceptible to the mixing of spatial orientations. Regardless of the mixed orientations problem, the major drawback of this codec is the computational requirements for motion compensation, which is performed locally at the block level.
Given the above prior art in this area, there exists the need for a wavelet-based video codec that avoids the problem of conventional multi-dimensional DWT that mixes different orientations and also avoids the need for motion estimation.