Due to the huge size of the raw data of digital signals, compression must be applied to the raw signals so that they may be transmitted and stored. The digital signals can be video, image, graphics, audio, speech, etc. In particular, digital image signals can be very large in size. Digital cameras can be used to capture high resolution images that can easily have a resolution of 10 mega-pixels or higher. Many remote sensing images or map images can have very high resolution as well. Image compression is very important for the storage, transmission and representation of such digital images.
One important international standard for image compression is the ISO/IEC 10918 standard, known commonly as the JPEG (Joint Photographic Experts Group) standard [G. K. Wallace, “The JPEG Still Picture Compression Standard,” IEEE Trans. On Consumer Electronics, vol. 38, no. 1, February 1992.]. The JPEG standard was finished in early 1990s and has since been used widely in internet and digital cameras. In the year 2000, ISO/IEC produced a new standard ISO/IEC 15444, known commonly as the JPEG2000 standard [ISO/IEC, ISO/IEC 15444-1: Information technology—JPEG 2000 image coding system—Part 1: Core coding system, 2000: ISO/IEC, ISO/IEC 15444-2: Information technology—JPEG 2000 image coding system—Part 2: Extensions, 2000:C. Christopoulos, et. al., “The JPEG2000 Still Image Coding System: An Overview,” IEEE Trans. on Consumer Electronics, vol. 46, no. 4, November 2000.], which can give both objective and subjective image quality superior to JPEG.
Baseline JPEG uses mainly discrete cosine transform (DCT), scalar quantization and variable length coding such as runlength coding, Huffman coding and arithmetic coding. On the other hand, JPEG2000 comprises discrete wavelet transform (DWT), scalar quantization, combined bit plane and arithmetic coding, and optimal rate control. Rate control or rate allocation is an algorithm or strategy to control the bit-rate of the signal coding such that it meets the target bandwidth, end-to-end delay and/or storage requirement. The ultimate target of rate control is to allocate the target bit-rate in the encoding of the signal such that the overall distortion can be minimized. In JPEG, the bit-rate is controlled by a single global value of quantization factor (or quality factor). As a result, the bit rate control is not accurate and the visual quality may vary from one region of the image to another. By using the bit-plane coding, JPEG2000 can control the bit-rate to meet the bit-rate requirement precisely and easily. And the bit rate is controlled locally and thus can be adapted to the local image characteristics.
The basic encoding algorithm of JPEG2000 is based on Embedded Block Coding with Optimized Truncation or EBCOT [D. Taubman, “High Performance Scalable Image Compression with EBCOT,” IEEE Trans. on Image Processing, vol. 9, no. 7, July 2000.]. The EBCOT algorithm partitions the wavelet coefficient into non-overlapped rectangle blocks called code-blocks. The code-block data are then entropy encoded by bit-plane coding. A rate-distortion optimization (optimal bit allocation) process is applied after all the quantized wavelet coefficients have been entropy encoded (compressed) and is referred to as post-compression rate-distortion (PCRD) optimization [D. Taubman, “High Performance Scalable Image Compression with EBCOT,” IEEE Trans. on Image Processing, vol. 9, no. 7, July 2000: H. Everett, “Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources,” Oper. Res., vol. 11, pp. 399-417, 1963.]. By utilizing the actual rate-distortion functions of all compressed data, the PCRD technique achieves minimum image distortion for any given bit-rate. However, since it requires the encoding of all the data and the storage of all the encoded bit-stream even though a large portion of the data needs not to be sent out, most of the computation and memory usage could be redundant in this process. Also the PCRD is an off-line process such that the whole image needs to be completely encoded before sending out any data and hence long delay is possible.
Another technique for the optimal rate allocation of JPEG2000 is by coefficients modelling. Kasner et al. [J. H. Kasner, M. W. Marcellin and B. R. Hunt, “Universal Trellis Coded Quantization,” IEEE Trans. on Image Processing, vol. 8, no. 12, pp. 1677-1687, December 1999.] assumed that the wavelet coefficients could be modelled by memory-less generalized-Gaussian density (GGD). By estimating the GGD parameter, the rate-distortion function can be approximated as required for the optimal rate allocation. This approach is included in Part-2 of JPEG2000 [ISO/IEC, ISO/IEC 15444-2: Information technology—JPEG 2000 image coding system—Part 2: Extensions, 2000.] and is called Lagrangian rate allocation (LRA). In this approach, both the rate and distortion are estimated before actually encoding the wavelet coefficients. A quantization step-size of each sub-band is selected based on the estimation and the quantized wavelet coefficients are encoded without any truncation. This approach does not have the issue of redundant computation cost and redundant memory usage. However the rate control accuracy is heavily depended on the coefficients following the assumption of GGD. An iterative technique is often required to converge on the target bit-rate. In each iteration, the quantization step-sizes are required to be re-estimated and the wavelet coefficients are thus quantized and entropy encoded again. The multiple quantization and entropy encoding processes heavily increase the complexity of this approach. In practice, the complexity of LRA is comparable to the PCRD approach.
Other than the empirical PCRD approach and the analytical LRA approach, Masuzaki et. al. [T. Masuzaki, et. al., “JPEG2000 Adaptive Rate Control for Embedded Systems,” Proc. IEEE Int. Sym. on Circuits and Systems, vol. 4, pp. 333-336, May 2002.] first proposed a non-optimal training-image based fast rate control method for JPEG2000. By training a set of test images using the PCRD method, the proposed fast method obtains the relationship between the number of coding passes (coding points) and the corresponding number of bytes within a sub-band. The relationship is then approximated by a linear curve. Given a target bit-rate, the fast method can predict the number of coding passes to be included in the final output using the linear model. However the results of the paper show that this method can suffer from a significant PSNR loss (>1 dB in 0.25 bpp). The loss could be much more significant as a single liner function cannot well approximate different kind of images.
Model based rate allocation is an attractive approach for fast rate control as it can provide the optimal quality when the coefficients follow the model assumption. However the major drawback is the degree of model accuracy. It is unlikely that an accurate model can be found for highly varied images. Thus we change our thought into non-model based fast rate control method.
JPEG2000, as noted previously, is the new international standard for still image coding. JPEG2000 is based on the discrete wavelet transform (DWT), scalar quantization, coefficient bit modelling, arithmetic coding and rate control. The DWT decomposes an image (or sub-image called tile) into sub-bands for with different level of decomposition. FIG. 1 shows an example of two-level DWT decomposition. The sub-bands consist of coefficients that represent the horizontal and vertical spatial frequency characteristics of the image/tile. Each sub-band is then quantized by a scalar quantizer and divided into non-overlapped rectangular blocks (called code-blocks in JPEG2000) with size typically 64×64 or larger. The quantized code-block data are entropy encoded (compressed) to form a code-block bit-stream. Each of the code-block bit-stream can be truncated to meet the target bit-rate by rate control and finally output to the channel in packet format.
After transformation, the wavelet coefficients are quantized using scalar quantization. Each of the coefficients ab(x,y) of the sub-band b is quantized to the value qb(x,y) by
                                          q            b                    ⁡                      (                          x              ,              y                        )                          =                              sign            ⁡                          (                                                a                  b                                ⁡                                  (                                      x                    ,                    y                                    )                                            )                                ·                      ⌊                                                                                                a                    b                                    ⁡                                      (                                          x                      ,                      y                                        )                                                                                              Δ                b                                      ⌋                                              (        1        )            where Δb is the quantization step size.
In lossless compression, the value of Δb must be one for all sub-bands. However, in lossy compression, no particular selection of the quantization step size is required in the standard. One effective way in selecting the quantization step size is to scale a default (or pre-defined) step size Δd by an energy weight parameter γb [J. W. Woods, J. Naveen, “A Filter Based Bit Allocation Scheme for Subband Compression of HDTV,” IEEE Trans. on Image Processing, vol. 1, no. 3, pp. 436-440, July 1992.] by
                              Δ          b                =                              Δ            d                                              γ              b                                                          (        2        )            
This selection of quantization step size is recommended in the standard and is implemented in the standard reference software [M. D. Adams and F. Kossentini, “JasPer: A Software-based JPEG-2000 Codec Implementation,” Proc. IEEE Int. Conf. On Image Processing, vol. 2, pp. 53-56, October 2000: M. D. Adams, “JasPer project home page,” http://www.ece.uvic.ca/˜mdadams/jasper, 2000.] with the default step size Δd equal to two for all sub-bands.
The quantized wavelet coefficients in the code-blocks are encoded using coefficient bit modelling and arithmetic coding. This process is called tier-1 coding in JPEG2000. Tier-1 coding is essentially a bit-plane coding technique that is commonly used in wavelet based image coders [J. M. Shapiro, “Embedded Image Coding using Zerotrees of Wavelet Coefficients,” IEEE Trans. on Signal Processing, vol. 41, no. 12, pp. 3445-3462, December 1993: A. Said, W. A. Pearlman, “A New, Fast, and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees,” IEEE Trans. on Circuits and Systems for Video Tech., vol. 6, no. 3, pp. 243-250, June 1996.]. In tier-1 coding, code-blocks are encoded independently of one another using exactly the same coding algorithm. For each code-block, coefficients are encoded starting from the most significant bit-plane (MSB) with a non-zero element towards the least significant bit-plane (LSB). Each coefficient bit in a bit-plane is selected to be included in only one of the three coding passes called significance pass, refinement pass and cleanup pass by using coefficient bit modelling. The coding pass data are then arithmetic encoded by a context-based adaptive binary arithmetic coder called MQ coder in JPEG2000.
Rate control in JPEG2000 is achieved partly by the quantization and partly by the selection of the coding pass data to be included in the final output (code-stream). The quantization process as mentioned before roughly controls the rate that is generally far from the target bit-rate and is applied only once. The accurate rate control is achieved by selecting part of the coding pass data to be included in the final code-stream. JPEG2000 has no requirement on which rate control method to be used. However an optimal rate control process called post-compression rate-distortion (PCRD) optimization is recommended in the standard. This process had been described in D. Taubman, “High Performance Scalable Image Compression with EBCOT,” IEEE Trans. on Image Processing, vol. 9, no. 7, July 2000 clearly and we will summarize it as follow.
Let {Bi}i=1,2, . . . denote the set of all the code-blocks that cover the whole image/tile. For each code-block, an embedded bit-stream is formed by the tier-1 coding with a set of allowable truncation points each of which is located at the end of each coding pass. Thus there is at most three truncation points for each bit-plane. For any code block Bi, the bit-stream can be truncated into different discrete length with bit-rate Ri1, Ri2, . . . . The corresponding distortion incurred by reconstructing those truncated bit-streams is denoted by Dini at truncation point ni=1,2, . . . . The optimal rate control process is to select the truncation points which minimize the overall reconstructed image distortion D where
                    D        =                              ∑            i                    ⁢                      D            i                          n              i                                                          (        3        )            subject to the rate constraint
                    R        =                                            ∑              i                        ⁢                          R              i                              n                i                                              ≤                      R            budget                                              (        4        )            where Rbudget denotes the target bit-rate.
Using the Lagrange multiplier technique [D. Taubman, “High Performance Scalable Image Compression with EBCOT,” IEEE Trans. on Image Processing, vol. 9, no. 7, July 2000: H. Everett, “Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources,” Oper. Res., vol. 11, pp. 399-417, 1963.], the optimization process is equivalent to minimize the cost function
                    J        =                              D            +                          λ              ⁢                                                          ⁢              R                                =                                    ∑              i                        ⁢                          (                                                D                  i                                      n                                          i                      ⁡                                              (                        λ                        )                                                                                            +                                  λ                  ⁢                                                                          ⁢                                      R                    i                                                                  n                        i                                            ⁡                                              (                        λ                        )                                                                                                        )                                                          (        5        )            
Therefore if we can find a value of λ such that the set of truncation points {ni(λ)} which minimizes (5) and the maximum achievable rate satisfies the rate constraint in (4) will be the optimal truncation points for a target bit-rate.
A simple algorithm in finding the optimal truncation points is mentioned in Taubman. At any truncation point ni, the R-D “slope” is given by
                              S          i                      n            i                          =                                            Δ              ⁢                                                          ⁢                              D                i                                  n                  i                                                                    Δ              ⁢                                                          ⁢                              R                i                                  n                  i                                                              =                                                    D                i                                                      n                    i                                    -                  1                                            -                              D                i                                  n                  i                                                                                    R                i                                  n                  i                                            -                              R                i                                                      n                    i                                    -                  1                                                                                        (        6        )            
In the rest of the paper, the term R-D slope is always referred to Eqn. (6). Assume Ni be the set of available truncation points for code-block Bi. The truncation point ni(λ) for a given value of λ is found such thatni(λ)=max{jεNi|Sij≧λ}  (7)where j=1,2, . . . is the truncation point index. However this equation is only true when the R-D slope is monotonically decreasing (Sini+1≦Sini). Thus the monotonically decreasing property is assumed in the optimization algorithm. Based on the monotonically decreasing property, the optimal value of λ denoted as λoptimal is equal to the minimum value of λ which satisfies the rate constraint in (4). In practice, an iterative approach with fast convergence is often used in searching the λoptimal. Once the λoptimal is found, the optimal truncation points can be found by (7) with λ=λoptimal.
In the PCRD algorithm, the R-D slope information of all the available truncation points are required to be pre-computed and stored in memory. This requires tier-1 encoding of all the quantized coefficients and the whole encoded bit-stream must be stored in memory even though a large portion of them will not be included in the final output after the optimal truncation. Therefore a significant portion of computational power and working memory size is wasted on computing and storing the unused data. We call this portion of computational power and working memory size to be redundant computational cost and redundant memory usage respectively. Also the PCRD method is a non-causal or off-line process because the entire image/tile needs to be completely encoded before sending out any data and hence long transmission delay is possible. Since the PCRD method requires tier-1 encoding of all the quantized coefficients, the computational complexity can be about 40% to 60% of the total CPU execution time [M. D. Adams and F. Kossentini, “JasPer: A Software-based JPEG-2000 Codec Implementation,” Proc. IEEE Int. Conf On Image Processing, vol. 2, pp. 53-56, October 2000: K. F. Chen, C. J. Lian, H. H. Chen and L. G. Chen, “Analysis and Architecture Design of EBCOT for JPEG-2000,” Proc. IEEE Int. Sym. Of Circuits and Systems, vol. 2, pp. 765-768, May 2001.].