Part I of JPEG2000 was issued as an international standard for image coding in December 2000. (The standard is ISO 15444|ITU-T Recommendation T.800.) Based on the discrete wavelet transform, JPEG2000 provides several advantages over the previous discrete cosine transform (DCT)-based JPEG standard, including improved compression efficiency, joint lossy to lossless compression in a single bitstream, and region of interest coding. The fundamental building blocks of a typical JPEG2000 encoder are shown in FIG. 1.
The first encoder stage 102 provides pre-processing of the original image data 101. This can include partitioning of the data into tiles, each of which is compressed independently using its own set of specified compression parameters. The tiled data can also be subjected to an intercomponent transform to decorrelate color data.
At the next stage 104, each tile-component 103 undergoes a wavelet decomposition, converting the spatial domain image data into frequency domain subband coefficients 105. A 3-level, 2-dimensional wavelet decomposition is depicted in FIG. 2. The first stage of the wavelet decomposition converts the image data into four subbands of coefficients. Each subband is denoted by two letters (‘H’ and ‘L’) indicating whether the coefficients correspond to high- or low-pass filtering in the horizontal and vertical directions, respectively in that order, as well as a number indicating the decomposition level. The decomposition is applied recursively to the LL subband, resulting in a total of 10 subbands for a 3-level transform.
The subband coefficients 105 are then quantized using a uniform deadzone quantizer 106. For each subband j, a basic quantizer step-size Δj is selected by the user to quantize all the samples in that subband. For a given coefficient c in subband j, the quantization formula is given by             q      ⁡              (        c        )              =                  sgn        ⁡                  (          c          )                    ⁢              ⌊                                          c                                            Δ            j                          ⌋              ,where q(c) represents the quantizer index associated with coefficient c. This corresponds to a quantizer with step-size Δj and a deadzone of size 2Δj, as depicted in FIG. 3.
At the decoder, the reconstructed value, ĉ, associated with c, is obtained by the following formula:       c    ^    =      {                                                                      (                                                      q                    ⁡                                          (                      c                      )                                                        +                  α                                )                            ⁢                              Δ                j                                                                                        if                ⁢                                                                   ⁢                                  q                  ⁡                                      (                    c                    )                                                              >              0                                                                                          (                                                      q                    ⁡                                          (                      c                      )                                                        -                  α                                )                            ⁢                              Δ                j                                                                                        if                ⁢                                                                   ⁢                                  q                  ⁡                                      (                    c                    )                                                              <              0                                                            0                                otherwise                              ,      where 0≦α<1, and typically       α    =          1      2        ,corresponding to midpoint reconstruction. With midpoint reconstruction and a step-size of Δj, any coefficient with a quantizer index of 0 has an error less than Δj, while any coefficient with quantizer index not equal to 0 has an error no greater than             Δ      j        2    .
One of the features of quantization with a deadzone equal to twice the step-size is its optimal embedded structure. This means that if an Mj-bit quantizer index (associated with coefficient c in subband j with quantizer step-size Δj) is transmitted progressively starting with the most significant bit (MSB) and proceeding to the least significant bit (LSB), the resulting index after decoding only Nj bits is identical to that obtained by using a similar quantizer with a step-size of Δj2Mj−Nj. Thus the effective quantization step-size associated with a coefficient c in subband j is not restricted to the value Δj, but can be altered at the decoder based on how many of the quantizer index bits are finally decoded. Similarly, the effective quantizer step-size can be altered at the encoder by adjusting how many of the quantizer index bits are included in the final compressed bitstream.
Quantizer step-sizes are often chosen so as to minimize perceived error in the reconstructed image, based on properties of the human visual system. In the case of visually lossless compression, the quantization step-sizes can be interpreted as the maximum error allowed in the subbands without incurring any visual artifacts.
Referring again to FIG. 1, in the JPEG2000 encoder, quantized subband coefficients 107 are partitioned into small rectangular blocks referred to as codeblocks. Each codeblock is encoded independently using an adaptive binary arithmetic coder 108. Codeblocks are encoded bitplane by bitplane, starting with the most significant bitplane. The encoding of a codeblock bitplane is further subdivided into three coding passes, each one containing information for only a subset of the coefficients of the codeblock. The product of each coding pass can be referred to as a fractional bitplane or partial-bitplane. The generation of compressed coding pass data is referred to in JPEG2000 as Tier 1 coding.
Finally, the compressed coding pass data 109 is organized by a bitstream organization module 110 into the output compressed bitstream 111. The arrangement of the compressed coding pass data into the final bitstream is referred to in JPEG2000 as Tier 2 coding.
The human visual system has varying sensitivity to signals of different spatial frequency, orientation and color. The properties of the human visual system can be modeled to derive an appropriate quantization step-size for every wavelet subband. The optimal quantization step-size for a particular wavelet coefficient, however, is also a function of image content. Many studies have shown that regions of an image containing sharp edges are much less perceptually forgiving of quantization error than smooth or detailed regions. Thus wavelet coefficients corresponding to sharp edges require fine quantization, while coarser quantization is acceptable for coefficients associated with smooth or detailed regions.
Given an image with regions of text, line art, background and photographic content, it is desirable to be able to quantize these regions differently. Fine quantization should be used in regions of text and line art to retain sharp edges, while coarser quantization is visually acceptable in background and photographic regions.
Alternatively, if a single quantization scheme must be applied uniformly throughout the entire image, one of two trade-offs occurs. If the finer quantization step-sizes associated with text are used to encode the entire image, regions of photographic content are represented with higher fidelity and bit-rate than is visually necessary, at the expense of an increased overall compressed file size. If the coarser quantization step-sizes associated with photographic content are used to encode the entire image, regions of text are not encoded with sufficient fidelity and suffer visual artifacts. Typically, these textual visual artifacts are considered unacceptable, and thus the quantization step-sizes are designed to ensure textual fidelity, at the expense of over-coding of the photographic regions.
Adaptive quantization within the original DCT-based JPEG standard is disclosed in U.S. Pat. No. 6,252,994, to Nafarieh, entitled “Adaptive Quantization Compatible with the JPEG Baseline Sequential Mode”.
Many fundamental differences exist between adaptive quantization for JPEG and JPEG2000. Baseline JPEG utilizes discrete cosine transform blocks, quantization without an extended deadzone, and encoding in a non-progressive manner. JPEG2000 utilizes wavelet coefficients, quantization with an extended deadzone, and bitplane encoding. These different characteristics require different adaptive quantization techniques with JPEG2000.
JPEG2000 offers flexibility toward achieving adaptive quantization. One method is by initially dividing the image spatially into tiles. Each tile is wavelet transformed and quantized independently, and thus each tile can be classified and quantized accordingly. The main drawback of this solution is the granularity of the classification. Tiles are typically 1024×1024 or 512×512, with smaller tiles resulting in an overall performance decrease. Any tile containing any text or line art information must be quantized finely, and with large tiles it becomes difficult to identify tiles completely free of text and line art information that can be quantized more aggressively.
Ideally, each wavelet coefficient is treated individually, and effectively quantized according to its individual type classification. Unfortunately, current JPEG2000 encoder algorithms have no mechanism by which to reach this result. A partial solution is a rate-distortion approach.
The nominal rate-distortion approach to JPEG2000 encoding is described in “High performance scalable compression with EBCOT,” IEEE Transactions on Image Processing, David Taubman, 9(7), pp. 1158-1170, (July 2000). In this method, each coding pass is assigned a rate value according to the size of the compressed data comprising the coding pass, and a distortion value according to the reduction in distortion achieved by including the coding pass data in the final bitstream. Mean squared error (MSE) or weighted MSE is used as the distortion metric. A rate-distortion optimization algorithm chooses those coding passes that yield the greatest rate-distortion performance (greatest reduction in distortion per bit of compressed data) to include in the final bitstream, given an overall rate constraint. While this approach yields optimal rate-distortion performance, it can not ensure any specific effective quantization step-size for any codeblock. If no rate constraint is specified, all codeblock data is included in the final bitstream. In this case, the effective quantizer step-size of a codeblock is that specified by the user for the corresponding subband, and there is no adaptivity from one codeblock to another within a subband. This algorithm is also restricted to decisions at the codeblock level, and does not evaluate coefficients individually.
In U.S. Pat. No. 6,668,090, entitled “Producing a Compressed Digital Image Organized into Layers Corresponding to Increasing Visual Quality Levels and Providing Rate-control of such Compressed Digital Image,” filed by Joshi and Jones, a visually weighted MSE term is calculated for each coding pass. This technique allows the bitstream to be optimized from a visual perspective, but provides no mechanism by which to ensure adaptive quantization.
In U.S. patent application Ser. No. 09/898,230, entitled “A Method for Utilizing Subject Content Analysis for Rate-control in a Lossy Image Compression System,” filed by Luo and Joshi, the distortion reduction calculation is modified to also be a function of the probability that image pixels correspond to the main subject. This technique can be used to weight the rate-distortion values of coding passes corresponding to certain regions of an image, but again cannot ensure that a specific effective quantization step-size will be achieved for any particular codeblock or coefficient.
JPEG2000 Part I also allows region of interest (ROT) coding, by which text regions can be identified and the corresponding encoded data placed first in the final bitstream. This method can be used to ensure that a specific collection of coefficients corresponding to the ROI are included at the desired quantization step-size in the final bitstream. However, there is no mechanism to ensure that the remaining coefficients are subsequently included at the desired effective quantization step-size.
It would thus be desirable to provide encoding methods, computer program products, and image encoders, which allow adaptive quantization of the wavelet coefficients at the coefficient level based on a classification of portions of an image into different image types and optionally allow JPEG2000 Part I compliance.