JPEG as described in W. Pennebaker and J. Mitchell, “JPEG still image data compression standard,” Kluwer Academic Publishers, 1993, (hereinafter “reference [1]”), G. Wallace, “The JPEG still-image compression standard,” Commun. ACM, vol. 34, pp. 30-44, April 1991 (hereinafter “reference [2]”), is a popular DCT-based still image compression standard. It has spurred a wide-ranging usage of JPEG format such as on the World-Wide-Web and in digital cameras.
The popularity of the JPEG coding system has motivated the study of JPEG optimization schemes—see for example J. Huang and T. Meng, “Optimal quantizer step sizes for transform coders,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 2621-2624, April 1991 (hereinafter “reference [3]”), S. Wu and A. Gersho, “Rate-constrained picture-adaptive quantization for JPEG baseline coders,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol. 5, pp. 389-392, 1993 (hereinafter “reference [4]”), V. Ratnakar and M. Livny, “RD-OPT: An efficient algorithm for optimizing DCT quantization tables”, in Proc. Data Compression Conf., pp. 332-341, 1995 (hereinafter “reference [5]”) and V. Ratnakar and M. Livny, “An efficient algorithm for optimizing DCT quantization,” IEEE Trans. Image Processing, vol. 9 pp. 267-270, February 2000 (hereinafter “reference [6]”), K. Ramchandran and M. Vetterli, “Rate-distortion optimal fast thresholding with complete JPEG/MPEG decoder compatibility,” IEEE Trans Image Processing, vol. 3, pp. 700-704, September 1994 (hereinafter “reference [7]”), M. Crouse and K. Ramchandran, “Joint thresholding and quantizer selection for decoder-compatible baseline JPEG,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 2331-2334, 1995 (hereinafter “reference [8]”) and M. Crouse and K. Ramchandran, “Joint thresholding and quantizer selection for transform image coding: Entropy constrained analysis and applications to baseline JPEG,” IEEE Trans. Image Processing, vol. 6, pp. 285-297, February 1997 (hereinafter “reference [9]”). The schemes described in all of these references remain faithful to the JPEG syntax. Since such schemes only optimize the JPEG encoders without changing the standard JPEG decoders, they can not only further reduce the size of JPEG compressed images, but also have the advantage of being easily deployable. This unique feature makes them attractive in applications where the receiving terminals are not sophisticated to support new decoders, such as in wireless communications.
Quantization Table Optimization
JPEG's quantization step sizes largely determine the rate-distortion tradeoff in a JPEG compressed image. However, using the default quantization tables is suboptimal since these tables are image-independent. Therefore, the purpose of any quantization table optimization scheme is to obtain an efficient, image-adaptive quantization table for each image component. The problem of quantization table optimization can be formulated easily as follows. (Without loss of generality we only consider one image component in the following discussion.) Given an input image with a target bit rate Rbudget, one wants to find a set of quantization step sizes {Qk: k=0, . . . , 63} to minimize the overall distortion
                    D        =                              ∑                          n              =              1                                                      Num                —                            ⁢              Blk                                ⁢                                          ⁢                                    ∑                              k                =                0                            63                        ⁢                                                  ⁢                                          D                                  n                  ,                  k                                            ⁡                              (                                  Q                  k                                )                                                                        (        1        )            subject to the bit rate constraint
                    R        =                                            ∑                              n                =                1                                                              Num                  —                                ⁢                Blk                                      ⁢                                                  ⁢                                          R                n                            ⁡                              (                                                      Q                    0                                    ,                  …                  ⁢                                                                          ,                                      Q                    63                                                  )                                              ≤                      R            budget                                              (        2        )            where Num_Blk is the number of blocks, Dn,k(Qk) is the distortion of the kth DCT coefficient in the nth block if it is quantized with the step size Qk, and Rn(Q0, . . . ,Q63) is the number of bits generated in coding the nth block with the quantization table {Q0, . . . ,Q63}.
Since JPEG uses zero run-length coding, which combines zero coefficient indices from different frequency bands into one symbol, the bit rate is not simply the sum of bits contributed by coding each individual coefficient index. Therefore, it is difficult to obtain an optimal solution to (1) and (2) with classical bit allocation techniques. Huang and Meng—see reference [3]—proposed a gradient descent technique to solve for a locally optimal solution to the quantization table design problem based on the assumption that the probability distributions of the DCT coefficients are Laplacian. A greedy, steepest-descent optimization scheme was proposed later which makes no assumptions on the probability distribution of the DCT coefficients—see reference [4]. Starting with an initial quantization table of large step sizes, corresponding to low bit rate and high distortion, their algorithm decreases the step size in one entry of the quantization table at a time until a target bit rate is reached. In each iteration, they try to update the quantization table in such a way that the ratio of decrease in distortion to increase in bit rate is maximized over all possible reduced step size values for one entry of the quantization table. Mathematically, their algorithm seeks the values of k and q that solve the following maximization problem
                              max          k                ⁢                              max            q                    ⁢                                                    -                Δ                            ⁢                                                          ⁢              D              ⁢                                                                                    Q                    k                                    →                  q                                                                    Δ              ⁢                                                          ⁢              R              ⁢                                                                                    Q                    k                                    →                  q                                                                                        (        3        )            where ΔD|Qk→q and ΔRQk→q are respectively the change in distortion and that in overall bit rate when the kth entry of the quantization table, Qk, is replaced by q. These increments can be calculated by
                    Δ        ⁢                                  ⁢        D        ⁢                                                                        Q                k                            →              q                                ⁢                      =                                          ∑                                  n                  =                  1                                                                      Num                    —                                    ⁢                  Blk                                            ⁢                                                          ⁢                                                [                                                                                    D                                                  n                          ,                          k                                                                    ⁡                                              (                        q                        )                                                              -                                                                  D                                                  n                          ,                          k                                                                    ⁡                                              (                                                  Q                          k                                                )                                                                              ]                                ⁢                                                                  ⁢                and                                                                        (        4        )                                Δ        ⁢                                  ⁢        R        ⁢                                                                        Q                k                            →              q                                ⁢                      =                                          ∑                                  n                  =                  1                                                                      Num                    —                                    ⁢                  Blk                                            ⁢                                                          ⁢                              [                                                                            R                      n                                        ⁡                                          (                                                                        Q                          0                                                ,                        …                        ⁢                                                                                                  ,                        q                        ,                        …                        ⁢                                                                                                  ,                                                  Q                          63                                                                    )                                                        -                                                            R                      n                                        ⁡                                          (                                                                        Q                          0                                                ,                        …                        ⁢                                                                                                  ,                                                  Q                          k                                                ,                        …                        ⁢                                                                                                  ,                                                  Q                          63                                                                    )                                                                      ]                                                                        (        5        )            The iteration is repeated until |Rbudget−R(Q0, . . . , Q63)|≦ε, where ε is the convergence criterion specified by the user.
Both algorithms aforementioned are very computationally expensive. Ratnakar and Livny—see references [5] and [6]—proposed a comparatively efficient algorithm to construct the quantization table based on the DCT coefficient distribution statistics without repeating the entire compression-decompression cycle. They employed a dynamic programming approach to optimizing quantization tables over a wide range of rates and distortions and achieved a similar performance as the scheme in reference [4].
Optimal Thresholding
In JPEG, the same quantization table must be applied to every image block. This is also true even when an image-adaptive quantization table is used. Thus, JPEG quantization lacks local adaptivity, indicating the potential gain remains from exploiting discrepancies between a particular block's characteristics and the average block statistics. This is the motivation for the optimal fast thresholding algorithm of—see reference [7], which drops the less significant coefficient indices in the R-D sense. Mathematically, it minimizes the distortion, for a fixed quantizer, between the original image X and the thresholded image {tilde over (X)} given the quantized image {circumflex over (X)} subject to a bit budget constraint, i.e.,min [D(X,{tilde over (X)})|{circumflex over (X)}] subject to R({tilde over (X)})≦Rbudget   (6)
An equivalent unconstrained problem is to minimizeJ(λ)=D(X,{tilde over (X)})+λR({tilde over (X)})   (7)
A dynamic programming algorithm is employed to solve the above optimization problem (7) recursively. It calculates J*k for each 0≦k≦63, and then finds k* that minimizes this J*k, i.e., finding the best nonzero coefficient to end the scan within each block independently. The reader is referred to reference [7] for details. Since only the less significant coefficient indices can be changed, the optimal fast thresholding algorithm—see reference [7]—does not address the full optimization of the coefficient indices with JPEG decoder compatibility.
Joint Thresholding and Quantizer Selection
Since an adaptive quantizer selection scheme exploits image-wide statistics, while the thresholding algorithm exploits block-level statistics, their operations are nearly “orthogonal”. This indicates that it is beneficial to bind them together. The Huffman table is another free parameter left to a JPEG encoder. Therefore, Crouse and Ramchandran—see references [8] and [9]—proposed a joint optimization scheme over these three parameters, i.e.,
                                          min                          T              ,              Q              ,              H                                ⁢                                    D              ⁡                              (                                  T                  ,                  Q                                )                                      ⁢                                                  ⁢            subject            ⁢                                                  ⁢            to            ⁢                                                  ⁢                          R              ⁡                              (                                  T                  ,                  Q                  ,                  H                                )                                                    ≤                  R          budget                                    (        8        )            where Q is the quantization table, H is the Huffman table incorporated, and T is a set of binary thresholding tags that signal whether to threshold a coefficient index. The constrained minimization problem of (8) is converted into an unconstrained problem by the Lagrange multiplier as
                              min                      T            ,            Q            ,            H                          ⁢                  [                                    J              ⁡                              (                λ                )                                      =                                          D                ⁡                                  (                                      T                    ,                    Q                                    )                                            +                              λ                ⁢                                                                  ⁢                                  R                  ⁡                                      (                                          T                      ,                      Q                      ,                      H                                        )                                                                                ]                                    (        9        )            
Then, they proposed an algorithm that iteratively chooses each of Q,T,H to minimize the Lagrangian cost (9) given that the other parameters are fixed.
JPEG Limitations
The foregoing discussion has focused on optimization within the confines of JPEG syntax. However, given the JPEG syntax, the R-D performance a JPEG optimization method can improve is limited. Part of the limitation comes from the poor context modeling used by a JPEG coder, which fails to take full advantage of the pixel correlation existing in both space and frequency domains. Consequently, context-based arithmetic coding is proposed in the literature to replace the Huffman coding used in JPEG for better R-D performance.