Compression is a useful method for reducing bandwidth consumption and download times of images sent over data networks. A variety of algorithms and techniques exist for compressing images. JPEG, a popular compression standard that is particularly good at compressing photo-realistic images, is in common use on the Internet. This standard, described in “JPEG Still Image Data Compression Standard”, by W. B. Pennebaker and J. L. Mitchell, Chapman & Hall, 1992, is based on a frequency domain transform of blocks of image coefficients. As seen in FIG. 1, JPEG calls for subdividing an image frame 12 into 8×8 pixel blocks 11 and at box 16 transforming the array of pixel values in each block 11 with a discrete cosine transform (DCT) so as to generate 64 DCT coefficients corresponding to each pixel block 11. The coefficients for each block 11 are quantized in quantization block 20 using a 64 element quantization table 24. Each element of table 24 is an integer value from 1 to 255, which specifies the step size of the quantizer for the corresponding DCT coefficients. The quantized coefficients for each block are entropy encoded in entropy coding box 28, which performs a lossless compression. The entropy encoder 28 is coupled to the output of the quantizer 20 from which the former receives quantized image data. Standard JPEG entropy coding uses either Huffman coding or arithmetic coding using either predefined tables or tables that are computed for a specific image.
The JPEG compressed image data is decompressed by the bottom circuit of FIG. 1 by being first passed through an entropy decoder 30. Next inverse quantization in block 32 using quantization table 34 is performed. Finally the inverse DCT transform block 36 performs an inverse DCT operation to produce the image pixel intensity data.
More specifically, the discrete cosine transform block uses the forward discrete cosine function (DCT) to transform the image pixel intensity X[x,y] to DCT coefficients Y[m,n] as follows:
      Y    ⁡          [              m        ,        n            ]        =            1              4        ⁢                  C          ⁡                      (            m            )                          ⁢                  C          ⁡                      (            n            )                                ⁢                  [                  ∑                  x          =          0                7            ⁢                        ∑                      y            =            0                    7                ⁢                  X          ⁢                      {                          x              ,              y                        ]                    ⁢          Cos          ⁢                                                    (                                                      2                    ⁢                    x                                    +                  1                                )                            ⁢              m              ⁢                                                          ⁢              π                        16                    ⁢          Cos          ⁢                                                    (                                                      2                    ⁢                    y                                    +                  1                                )                            ⁢              n              ⁢                                                          ⁢              π                        16                                ]  where C(m) and C(n)=1/√2 for m,n=0, and C(m) and C(n)=1
The next step is to quantize the DCT coefficients using a quantization matrix, which is an 8×8 matrix of step sizes with one element for each DCT coefficient. A tradeoff exists between the level of image distortion and the amount of compression, which results from the quantization. A large quantization step produces large image distortion, but increases the amount of compression. A small quantization step produces lower image distortion, but results in a decrease in the amount of compression. JPEG typically uses a much higher step size for the coefficients corresponding to high spatial frequency in the image, with little noticeable deterioration in the image quality because of the human visual system's natural high frequency rolloff. The quantization is actually performed by dividing the DCT coefficient Y[m,n] by the corresponding quantization table entry Q[m,n] and the result rounded off to the nearest integer according to the following:T[m,n]=round(Y[m,n]/Q[m,n])to give a quantized coefficient T[m,n]. This type of quantizer is sometimes referred to as a midtread quantizer. An approximate reconstruction of Y[m,n] is effected in the decoder by entrywise multiplication of T[m,n] by Q[m,n] to obtain a reconstructed Z[m,n]:Z[m,n]=Q[m,n] T[m,n]The difference between Y[m,n] and Z[m,n] represents lost image information causing distortion to be introduced. The amount of this lost information is bounded by the magnitude of Q[m,n].In the case of an image with multiple color channels, the aforementioned steps are applied in a similar fashion to each channel independently. In general practice, the color channels are sub-sampled to achieve greater compression, without significantly altering the quality of the image reconstruction.
The quantization step is of particular interest since this is where information is discarded from the image. Ideally, one would like to discard as much information as possible, thereby reducing the stored image size, while at the same time maintaining or increasing the image fidelity. Within the standard there is no prescribed method of quantizing the image, but there is nonetheless a popular method used in the software of the Independent JPEG Group (ISO/IEC JTC1 SC29 Working Group 1), and employed extensively by the general community. This method involves scaling a predetermined quantization table (calculated from statistical importance of basis vectors over a large set of images) by a factor dependent on a user-set quality, which lies in the range 1-100. This method yields good results on average, but is based on statistical averages over many images, and doesn't address global image characteristics, let alone local characteristics.
V. Ratnakar and M. Livny. “RD-OPT: An efficient algorithm for optimizing DCT quantization tables.” Proceedings DCC'95 (IEEE Data Compression Conference), pages 332-341, 1995 (and also U.S. Pat. No. 5,724,453) describe a rate-distortion dynamic programming optimization technique to reduce distortion for a given target bit-rate, or reduce bit-rate for a given target distortion. This reference uses “Mean Squared Error” as a measure of distortion and introduces some novel techniques for estimating bit-rate that improve the computational efficiency of the calculation. This algorithm is designed to calculate a single quantization table Q for each channel of the image, and it is based solely on global aggregate statistics. Also it does not take into account varying local image statistics. Moreover the method is computationally expensive. There exists another technique, which simultaneously optimizes the quantization and entropy encoding steps yielding a completely optimum JPEG file stream. This technique, however is extremely slow and unrealistic for real-time JPEG optimization.
U.S. Pat. No. 5,426,512 entitled “Image data compression having minimum perceptual error” uses a rate-distortion dynamic programming optimization technique to reduce distortion for a target bit-rate, or reduce bit-rate for a target distortion. This technique is very similar in concept to V. Ratnaker et al., except that the latter uses a “perceptual error” measure which attempts to mimic the eye's sensitivity to error. This algorithm is designed to calculate a single quantization table Q for each plane of the image, and it is based solely on global aggregate statistics, and it does not take into account varying local image characteristics.
U.S. Pat. No. 5,883,979 entitled “Method for selecting JPEG quantization tables for low bandwidth applications” is directed mainly at preserving text features in JPEG images at very low bit-rates. It uses image analysis based on global statistics to determine which DCT basis vectors are more visually important to the image, and weights them accordingly in the quantization table. Again, this algorithm is based on global statistics and also it is geared specifically for preserving textual data in JPEG images.
Ideally, one would like to have an optimal quantization table for every significantly different region of the image (a technique adopted for example in MPEG), which would then allow one to increase image fidelity as a function of file size; this technique of using different quantization tables for different areas of an image is generally referred to as variable quantization. In variable quantization, the figures of merit in question are image quality (distortion) and output file size (rate). The problem is then to decrease image distortion for a target rate, or to decrease rate for a target distortion. Of particular interest is the latter, since it has direct application in minimizing bandwidth usage for images which are sent over computer networks. This also reduces the time to transmit the image, which is important when the network path includes slow speed links.
JPEG Part 3 (ISO/IEC 10918-3), approved in 1995, defines extensions to the JPEG standard that allow for variable quantization. Unfortunately, these extensions are not supported by most applications (including most web browsers, and the IJG reference implementation).
U.S. Pat. No. 6,314,208 entitled “System for variable quantization in JPEG for compound documents” describes a system for determining variable quantization local scaling factors using a block classifier that separates text and picture information. This algorithm effectively employs variable quantization but requires the use of extensions only introduced in JPEG Part 3.
Similarly, US Patent Application No. 2001/0043754 describes a method for determining local scaling factors based on perceptual classification performed in the spatial domain. This algorithm also presupposes use of JPEG Part 3 extensions.
It is preferred that any technique for quantizing an image also be computationally efficient, especially when the quantization is performed on images which are generated dynamically, or images which cannot be stored in a caching system. If the quantization is too slow, then any transmission time benefit realized from the reduction in rate is effectively annulled by the latency introduced in the quantization computation.
Accordingly, it is an object of the invention to provide a method for quantizing a JPEG image, which offers many of the benefits of variable quantization and is computationally efficient, while conforming to the widely used JPEG Part 1 standard.