General
MPEG-4 and JPEG2000 prerequisites hold concepts like graceful degradation and scalability, implemented in progressive transmission schemes. When transmitting image information over a communication channel, the sender is often not aware of the properties of the output devices such as display size and resolution, and the present requirements of the user—for example when he is browsing through a large image database. To support the large spectrum of image and display sizes and resolutions, the coded bit stream must be formatted in such a way that whenever the user or the receiving device interrupts the bit stream, a maximal display quality is achieved for the given bit rate. The progressive transmission paradigm incorporates that the data stream should be interruptible at any stage and still deliver at each breakpoint a good trade-off between reconstruction quality and compression ratio.
The classic block-based compression techniques (JPEG, MPEG-1, MPEG-2, H.263 . . . ), based on the decorrelating discrete cosine transform, hardly meet these requirements, since information is transmitted on a block basis. Interrupting the bit stream results in a partially reconstructed image, leaving the non-transmitted image part undefined. Additionally, since they are strictly block-based, disturbing block artifacts reduce the visual perception quality.
Wavelet transform based compression schemes avoid the above-mentioned problems since their data path is such that several image resolutions are obtained throughout the coding process (FIG. 1). Transmitting successively all the subbands (also denoted subband images or sub-images), starting with the average sub-image in the top left corner, already fulfils the graceful degradation requirement (FIG. 2). Of course, the relation between the compression ratio and the image reconstruction quality in this subband-by-subband image scanning approach can hardly be called optimal, there all subbands—each with their own weight—contribute to the final quality of the decoded image. For example, the least significant bit-plane of the average subband might for this criterion be less important then for instance the third bit-plane of the first LH-subband.
Note that although the wavelet transform is a particular method, which can be exploited for subband image encoding, and that the invention is illustrated for wavelet-transformed images, the invention is not limited hereto.
State of the art image compression techniques based on subband coding, typically exploit in their quantization and entropy coding steps the preservation of frequency and spatial information. They (e.g. wavelet transform) deliver in the transform domain information concerning the frequencies present at a specific spatial location. Hence, the quantization and entropy coding parts can exploit both the dependency between spatially neighboring pixels in one subband and the dependency between pixels in different subbands at corresponding spatial locations in a rate-distortion sense. Typically the first approach is addressed as intra-subband coding (e.g. quad-tree-based [J. Cornelis, A. Munteanu, A. Salomie, P. Schelkens, R. Deklerck, Y. Christophe and V. Enescu, “Medical Image Compression: Options for the Future”, Proceedings of Biosignal '98, pp. 1-12, 1998]], the latter as inter-subband coding (embedded zero-tree coding [J. M. Shapiro, “Embedded Image Coding Using Zero-trees of Wavelet Coefficients”, IEEE Transactions on Signal Processing, Vol. 41, no.12, pp.3445-3462, 1993] and SPIHT[A. Said and W. Pearlman, “A new fast and efficient image codec based on set partitioning in hierarchical trees”, IEEE Trans. on Circuits and Systems Video Technology 6 (1996) 243-250.]). Both intra-subband and inter-subband approaches successively approximate the subband coefficients by starting with the coarsest refinement level (i.e. most significant bit-plane) and ending at a refinement level needed to obtain the requested compression ratio (or required image quality). Refinement levels are also denoted quantization levels and the quantization approach is referred to as successive approximation quantization (SAQ). Remark that each bit-plane is completely scanned before the processing of another one is started. These methods thus exploit a quantization level-by-quantization level scanning method as opposed to a subband-by-subband scanning approach. With the quantization level-by-quantization level scanning method one is capable of coarsely approximating the optimal rate-distortion behavior when all required bits are transmitted while performing an appropriate thresholding to obtain the required compression quality. However, if the bit-stream is interrupted at an earlier stage in the transmission the obtained rate-distortion is far from satisfactory due to the bad weighting of the information in the different subbands: relatively seen we have transmitted too much information of one subband compared to another subband. Thus, a correct thresholding was not obtained.
Embedded Zero-tree Coding
Since, the invention is further illustrated for embedded zero-tree encoding, we will first discuss this coding approach. However, we have to stress that the invention is not limited hereto, and is applicable to all SAQ-based compression schemes.
By carefully studying the redundancy between spatially corresponding pixels of the different subbands, a remarkable coherence was revealed. J. Shapiro figured out that exploiting this property enhances the compression performance [J. M. Shapiro, “Embedded Image Coding Using Zero-trees of Wavelet Coefficients”, IEEE Transactions on Signal Processing, Vol. 41, no.12, pp.3445-3462, 1993.]. He considered the relation between spatially related pixels of different subbands as parent-children links. In the context of a N-level discrete wavelet transform, this means that a pixel in a level l sub-image (either LH, HL or HH) corresponds spatially to four pixels in the level l−1 sub-image with the same type of frequency constellation, i.e. LH, HL or HH. The inter subband correlation is exploited using the fact that the probability is rather high that magnitudes of the child pixel values are smaller than a certain threshold, whenever the parent's magnitude value is smaller than that threshold. This means that all the pixels of the same spatial locality can be coded in one step i.e. zero-tree coding. Within this scalar quantization method the comparison is performed with a threshold corresponding to the bit-planes (also denoted quantization levels). Thus, progressive transmission capabilities are an inherent property of the coding scheme, i.e. gradually refining the threshold levels (e.g. coding from the most significant bit-plane towards the least significant bit-plane) and respecting the order of importance of the subbands (FIG. 3). This way of image scanning rules out the objections against classic schemes (FIG. 2).
Since the embedded zero-tree wavelet encoding technique proposed by J. Shapiro utilizes scalar quantization, it partially fails to recognize spatial structures within the subbands, i.e. spatial redundancy. To overcome this shortcoming a progressive vector quantization based embedded zero-tree coding was introduced by E. da Silva [E. A. B. da Silva, “Wavelet Transforms for Image Coding”, PhD Dissertation, University of Essex, England, 1995.]. The significance of an image vector—composed out of a set of neighboring pixels—is evaluated by comparing the magnitude of that vector with a yardstick value, i.e. a vector threshold. This value allows layering the subbands similarly to the bit-plane concept in the scalar case (FIG. 3). As such this vector quantization also defines quantization levels. The vector codebook consists out of a set of normalized lattice-based directional code vectors.
As said in the introduction, a straightforward implementation of the above mentioned quantization level-by-quantization level scanning methods ([J. M. Shapiro, “Embedded Image Coding Using Zero-trees of Wavelet Coefficients”, IEEE Transactions on Signal Processing, Vol. 41, no.12, pp.3445-3462, 1993.], [E. A. B. da Silva, “Wavelet Transforms for Image Coding”, PhD Dissertation, University of Essex, England, 1995.]) does not lead to maximal coding performance, since both methods give the same weight to the information present in the different subbands. Thus, thresholding has to be introduced.
Practice indicates that the subbands do have a different energy and bit-range. The energy differences suggest that the information content of the different subbands is heterogeneous. It is therefore advisable to privilege the subbands with the highest energy. Applying subband dependent hard thresholding in a lossy compression scheme allows privileging the subband with the highest energy (FIG. 4). To find suitable threshold levels for each subband, we have to minimize the quantization error D(b) subject to the total bit-rate Rq(b), where the vector b represents the bit-rates allocated to the different subbands [G. Strang, T. Nguyen, “Wavelets and Filter Banks”, Wellesley-Cambridge Press, Wellesley, USA, 1996]. The bit-rate is an indication of the number of bits that is being considered for compression, it does not reflect the effective rate obtained after arithmetic encoding, being part of further coding steps.                               D          ⁡                      (            b            )                          =                              ∑                          k              =              1                        M                    ⁢                                    α              k                        ⁢                          ω              k                        ⁢                          2                                                -                  2                                ⁢                                  b                  k                                                      ⁢                          σ              k              2                                                          (        1        )                                                      R            q                    ⁡                      (            b            )                          =                              ∑                          k              =              1                        M                    ⁢                                    α              k                        ⁢                          b              k                                                          (        2        )            
M represents the total number of subbands, αk is the relative subband size, ωk is the perceptual weighting factor, and σk2 the subband variance. The latter is a good representative of the subband energy. Remark that while the low pass image at the highest wavelet level has a Gaussian distribution, the high pass images do have a Laplacian distribution. Assuming we want to obtain a certain fixed bit-rate Rq,c, minimizing D(b) can be solved by applying a method based on Lagrange multipliers:                                           ∑                          k              =              1                        M                    ⁢                                    ∂                              ∂                                  b                  k                                                      ⁡                          [                                                D                  ⁡                                      (                    b                    )                                                  +                                  λ                  ⁡                                      (                                                                                            R                          q                                                ⁡                                                  (                          b                          )                                                                    -                                              R                                                  q                          ,                          c                                                                                      )                                                              ]                                      =                                            ∑                              k                =                1                            M                        ⁢                                          ∂                                  ∂                                      b                    k                                                              ⁡                              [                                                      α                    k                                    ⁡                                      (                                                                                            ω                          k                                                ⁢                                                  2                                                                                    -                              2                                                        ⁢                                                          b                              k                                                                                                      ⁢                                                  σ                          k                          2                                                                    +                                              λ                        ⁢                                                                                                   ⁢                                                  b                          k                                                                                      )                                                  ]                                              =          0                                    (        3        )            
The differentiation with respect to bk delivers then:                               b          k                =                              1            2                    ⁢                      log            2                    ⁢                                           ⁢                                                    (                                  2                  ⁢                                                                           ⁢                  ln                  ⁢                                                                           ⁢                  2                                )                            ⁢                              ω                k                            ⁢                              σ                k                2                                      λ                                              (        4        )            
The fixed bit-rate constraint Rq,c poses:                                           ∑                          k              =              1                        M                    ⁢                                    α              k                        ⁢                          b              k                                      =                                            1              2                        ⁢                                          ∑                                  k                  =                  1                                M                            ⁢                                                α                  k                                ⁢                                  log                  2                                ⁢                                                                   ⁢                                                                            (                                              2                        ⁢                                                                                                   ⁢                        ln                        ⁢                                                                                                   ⁢                        2                                            )                                        ⁢                                          ω                      k                                        ⁢                                          σ                      k                      2                                                        λ                                                              =                      R                          q              ,              c                                                          (        5        )            
This yields then the Lagrange multiplier λ:                     λ        =                  2                                                    ∑                                  k                  =                  1                                M                            ⁢                                                α                  k                                ⁢                                                      log                    2                                    ⁡                                      [                                                                  (                                                  2                          ⁢                                                                                                           ⁢                          ln                          ⁢                                                                                                           ⁢                          2                                                )                                            ⁢                                              ω                        k                                            ⁢                                              σ                        k                        2                                                              ]                                                                        -                          2              ⁢                              R                                  q                  ,                  c                                                                                        (        6        )            
Equations (4) and (6) provide the bit lengths bk. For small variances the results can be negative, and therefore they should be truncated to zero. The calculations have then to be repeated for a reduced number of subbands—ignoring the insignificant ones—until all bk are bigger or equal to zero, and adjusting the fixed bit-rate to:                               R                      q            ,            c                          =                              ∑                          k              =              1                        M                    ⁢                                    α              k                        ⁢                          b              k                                                          (        7        )            
The perceptual weighting factors ωk are weighing the different subbands in such a way that the visual perception of the image is optimized.
Although the described subband dependent hard thresholding (FIG. 4) allows privileging the subband with the highest energy, it still works in a quantization level-by-quantization level scanning manner.