1. Field of the Invention
The present invention relates to image and video coding and decoding and more particularly, it relates to a method for optimizing a wavelet packet structure for subsequent tree-structured coding.
2. Description of the Related Art
Current technology is leading toward the transmission of higher quality images to the consumer public. An example of such transmissions is digital video transmission via satellites and the evolution of High-definition television (HDTV). The transmission of images at such high resolution requires compression (coding), on the senders end, and decompression on the receiving end.
Generally, two classes of image coding algorithms typically provide excellent performance on images: (i) tree-structured wavelet based algorithms (Class I); and (ii) frequency and space-frequency adaptive algorithms (Class II). Both the Class I and Class II algorithms are very good at exploiting interband correlation in wavelet decomposition by efficiently representing strings of insignificant coefficients (referred to as zerotrees or spatial hierarchical trees).
In Class I algorithms, the transmission of information regarding significant coefficients is progressive (i.e., the encoder sends coefficient information one bit plane at a time, from highest to lowest). Therefore, these algorithms produce fully embedded bitstreams which can be partially decoded up to any rate or distortion. An example of a tree-structured coding algorithm is the Set Partitioning in Hierarchical Trees (SPIHT) algorithm (See U.S. Pat. No. 5,764,807). A limiting feature of these Class I algorithms is that they use a fixed logarithmic (octave-band) wavelet decomposition which does not always de-correlate image signals efficiently.
Class II algorithms, however, focus on adapting a wavelet packet transform to images by optimizing the frequency decomposition with respect to a rate distortion metric, thus optimizing the frequency decomposition with the objective of signal compression. These traditional wavelet packet optimization techniques neglect information about the structure of the lossy part of the compression scheme. They also result in a loss of the bitstream embedding property which is one feature that makes the use of tree-structured algorithms desirable. This neglected information can help guide the optimization procedure so as to result in beneficial wavelet packet (WP) structures.
Although wavelet packet optimization can be applied to an image prior to executing a tree-structured algorithm (e.g., SPIHT), unconstrained optimization can result in a transformed image with complicated or incoherent tree-shaped spatial relations, as will be described below. Thus, strict (unconstrained) optimization of a wavelet packet structure prior to coding using a tree-structured algorithm is not always efficient.
While the classical (octave-band) wavelet transform usually provides an efficient frequency representation for images, its fixed space-frequency tiling does not always match the spectrum of input images. In fact, it has been shown that only those images with exponentially decaying power spectra are efficiently de-correlated by wavelet decomposition. Wavelet packets provide a rich library of transforms (i.e., frequency decompositions) from which, transforms adapted to images can be chosen. An algorithm for the choice of a best wavelet packet basis was first proposed K. Ramchandran and M. Vetterli, xe2x80x9cBest wavelet packet bases in a rate-distortion sensexe2x80x9d IEEE Transactions on Image Processing, vol. 2 no. 2, pp. 160-175, April 1993, which is incorporated herein by reference. The Ramchandran and Vetterli algorithm optimizes the wavelet packet selection in a rate-distortion sense. The wavelet transform of an image produces a two-dimensional array of wavelet coefficients. These coefficients require quantization (i.e., assignment of a discrete integer value within a set of values). For example, for a linear quantizer with step size 16, the set of quantized coefficient values could be (xe2x88x9264, xe2x88x9248, xe2x88x9232, xe2x88x9216, 0, 16, 32, 48), corresponding to quantization indexes (0, 1, 2, . . . , 7). Thus, rather than sending the quantized value itself, its index xe2x80x9cixe2x80x9d is sent representative of the particular quantized value. An example of such quantization is shown by applying a simple fixed-step scalar quantizer. The rate assigned to a quantized coefficient is estimated by its first order entropy, rc=xe2x88x92log2 (pi), where pi denotes the probability of the index i assigned to coefficient c. The rate R for a subband is the sum of the rates of all its quantized coefficients.
Distortion is measured in the mean-squared-error (MSE) sense in the transform domain. MSE is the mean of squared differences between a coefficient and its quantized value. The MSE is a reasonable approximation of distortion in the image domain, provided that the wavelet basis is close to orthonormal, which is the case for the widely used 7/9 tap wavelet, as disclosed in M. Antonini, M. Barlaud, P.Mathieu and I. Daubechies, xe2x80x9cImage coding using wavelet transform,xe2x80x9d IEEE Transactions on Image, vol. 1, pp. 205-221, April 1992, which is incorporated herein by reference.
Referring to FIG. 1, the Ramchandran and Vetterli algorithm starts by growing a balanced frequency tree of depth L, where L equals the number of levels. FIG. 1 illustrates the one-dimensional case, with L=3. Starting at level k=L, pruning of each pair of level k nodes in the tree is conditionally performed based on the rate-distortion criterion:
Prune if: D+xcexR less than (D1+xcexR1)+(D2+xcexR2),xe2x80x83xe2x80x83(1)
where D1, D2, R1, R2 indicate distortion and rate for two sub-bands at level k, and D, R indicate the distortion and rate for the synthesized sub-band at level k-1, and xcex is an empirically chosen parameter. In the 2D case, four sets of distortion and rate need to be computed at level k. The algorithm then proceeds to attempt pruning at the next level and so on all the way to the root of the frequency tree.
In the frequency tree of FIG. 1, a parent node and its children exist in the same frequency band. Thus, the relation of a parent to its children is referred to as frequency co-location. It is important to note that at any one time in the design of the optimal transform, either a parent exists or its children, but never both. This is opposite to the spatial parent-children relationships that are described hereinbelow.
The concept of tree-structured coding presented in J. Shapiro, xe2x80x9cEmbedded image coding using zerotrees of wavelet coefficients,xe2x80x9d IEEE Transactions on Signal Processing, vol. 41, pp. 3445-3462, December 1993 (which is incorporated herein by reference), relies on a classical sub-band/wavelet decomposition of an image to level L. Wavelet coefficients are then organized in spatial hierarchical trees, as illustrated in FIG. 2a, where a parent coefficient in a sub-band at level k gives rise to four children (or offspring) coefficients at level k-1. Similarly, the frequency band Rk in FIG. 2a is referred to as the parent region to the spatial offspring region Rkxe2x88x921.
In this case, the parent and its set of children coefficients correspond to the same spatial region in the original image. This property of two sets of coefficients is referred to as spatial co-location. The concept of zerotrees as proposed by Shapiro offers an efficient way to encode insignificant coefficients by exploiting both the predictability of spatially co-located significant coefficients, and the tendency of insignificant coefficients to exist in trees.
The efficiency of Shapiro""s algorithm has since been improved by introducing more flexibility in the definition of zerotrees and finding a way to encode the location of these trees without a need for side information (See, A. Said, W. A. Perlman, xe2x80x9cA new, fast, and efficient image codec based on set partitioning in hierarchical trees,xe2x80x9d IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, pp. 243-250, June 1996, which is incorporated herein by reference). The SPIHT algorithm is widely recognized as one of the best still image coders available.
SPIHT assumes an orthonormal or bi-orthonormal wavelet transform decomposition of an image which ensures that mean-squared-error (MSE) distortion due to quantization is very similar in the transform and image domains. This results in the fact that the nth most significant bits of any two coefficients in the transformed image contribute equally to MSE reduction. The SPIHT algorithm produces fully embedded bitstreams, i.e., bitstreams that can be partially decoded up to any specified rate lower than the encoding rate, by progressively transmitting individual coefficient values from most significant to least significant bit.
It is an object of the present invention to preserve coherent relationships between parent coefficients and their respective four spatial offspring at each step in the optimization of the wavelet packet structure. This and other objects are achieved in accordance with the present invention for optimizing a wavelet packet structure for tree-structured coding in which a valid frequency tree representing an image is defined. The tree has at least one level, parent regions and offspring regions. The frequency tree is sequentially pruned by forcing a selection of valid frequency trees at each step of the construction of the wavelet packet structure. A tree-structured coding algorithm is applied to the sequentially pruned final frequency tree to code the image.