1. Technical Field
This invention is directed toward a system and method for encoding and decoding image data. More specifically, the invention is directed toward an improved coding and decoding method that exploits spatial correlations within an image by use of hybrid directional prediction and lifting wavelet techniques. The coding method may also be used to code intra frames of video sequences.
2. Background Art
Image compression is playing an important role in modern life with the rapid increase in the numbers of digital camera. Many compression schemes have been developed in past decades. These include early Differential Pulse Code Modulation (DPCM)-based [1], Discrete Cosine Transform (DCT)-based compression schemes [1]-[4] and wavelet-based [5]-[20] compression techniques. The DCT-based schemes, like JPEG [1], usually offer a low computational solution, but they have difficulty in achieving desired scalabilities.
In comparison to the DCT-based compression schemes, the wavelet-based schemes typically require more computational power. On the other hand, the wavelet transform [21] provides a multi-scale representation of images in the space-frequency domain. Aside from the energy compaction and de-correlation properties that facilitate compression, a major advantage of the wavelet transform is its inherent scalability. For example, the wavelet-based JPEG2000 standard [7] not only presents superior compression performance over the DCT-based JPEG standard, but also offers scalabilities in rate, quality and resolution that are very desirable for consumer and network applications.
As a matter of fact, natural images often contain richly directional attributes, which can be commonly approximated as linear edges on a local level. These edges may be neither vertical nor horizontal. However, most mainstream image coding schemes do not take such a fact into account [1], [5]-[7]. Two dimensional (2D) DCT or wavelet transforms are always performed in the horizontal and vertical directions. This results in large magnitudes in the high-frequency coefficients. In addition, at low bit-rates, the quantized effects can be observed clearly at image edges as the notorious Gibbs artifacts. This problem has been realized by many researchers [3], [4], [8]-[20]. Feig et al. introduced spatial prediction into a JPEG-wise code in a manner similar to the fractal-based image compression [3]. It does not outperform the pure DCT-based one in terms of PSNR/bit-rate trade-off. However, at very low bit-rates, it results in far fewer block artifacts and markedly better visual quality. Kondo et al. performed the directional prediction on DCT block, which can be predicted from one of four coded neighboring DCT blocks [4]. The new video coding standard H.264 has also successfully applied the block-based spatial prediction technique into the intra frame coding. It has shown significant gain on coding efficiency over that without spatial prediction [22].
There are many people who have investigated this problem in the wavelet/sub-band coding schemes. Ikonomopoulos et al. proposed a fixed set of directional filters to adapt to texture correlation at different directions [8]. Li et al. incorporated subband decomposition into the Ikonomopoulos' scheme [9]. Bamberger et al. used a filter bank based on a rectangular image sampling [10]-[12]. It can resolve images into many different directional components. Ridgelet and Curvelet recently developed by Candes et al. are another kind of transform with the polar sampling [13][14]. Mahesh et al. decomposed hexagonally sampled images into sub-bands that are selective in both frequency and orientation [15]. Taubman et al. proposed a scheme, where the input image is first re-sampled before the wavelet transfrom [16]. The re-sampling process can rotate image edges to the horizontal or vertical direction. Wang et al. used the similar idea from Taubman et al., but further proposed the overlapped extension to prevent coding artifacts around the boundaries of different direction regions [17]. Similar works on wavelet packet have been also reported in [18][19].
Few authors, however, have proposed to utilize the directional prediction into the lifting-based wavelet transform. The wavelet transform can be implemented with two ways: convolution based and lifting based. A lifting implementation was proposed by Daubechies. The lifting structure developed by Daubechies et al. is an efficient and popular implementation of the wavelet transform, where every Finite Impulse Response (FIR) wavelet filter can be factored into several lifting stages [23]. The convolution implementation of the wavelet transform allows spatial prediction to be integrated with great difficulty, while the wavelet technique potentially allows for the incorporation of spatial prediction. However, the technique proposed by Daubechies does not use any spatially directional information. Boulgouris et al. proposed an adaptive lifting technique to minimize the predicted error variance [20]. Similar to the idea from Ikonomopoulos, it derives several directional filters from the quincunx sampling and selects one of them with a median operation. But, it does not show significant gain in lossless image coding.
Therefore, what is needed is a system and method for encoding or decoding image data, such as, for example, video data, wherein the bit stream can be encoded using a method that takes advantage of spatial correlations within an image and that does not result in large coefficients in high frequency coefficients. This system and method should also be computationally efficient.
It is noted that in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.