This invention relates to data compression using the JPEG compression standard for continuous-tone still images, both grayscale and color.
A committee known as "JPEG" which stands for "Joint Photographic Experts Group," has established a standard for compressing continuous-tone still images, both grayscale and color. This standard represents a compromise between reproducible image quality and compression rate. To achieve acceptable compression rates, which refers to the ratio of the uncompressed image to the compressed image, the JPEG standard adopted a lossy compression technique. The lossy compression technique was required given the inordinate amount of data needed to represent a color image, on the order of 10 megabytes for a 200 dots per inch (DPI) 8.5".times.11" image. By carefully implementing the JPEG standard, however, the loss in the image can be confined to imperceptible areas of the image, which produces a perceptually lossless uncompressed image. The achievable compression rates using this technique are in the range of 10:1 to 50:1.
FIG. 1 shows a block diagram of a typical implementation of the JPEG compression standard. The block diagram will be referred to as a compression engine. The compression engine 10 operates on source image data, which represents a source image in a given color space such as CIELAB. The source image data has a certain resolution, which is determined by how the image was captured. Each individual datum of the source image data represents an image pixel. The pixel further has a resolution which is determined by the number of bits used to represent the image pixel.
The source image data is typically formatted as a raster stream of data. The compression technique, however, requires the data to be represented in blocks. These block represent a two-dimensional portion of the source image data. The JPEG standard uses 8.times.8 blocks of data. Therefore, a raster-to-block translation unit 12 translates the raster source image data into 8.times.8 blocks of source image data. The source image data is also shifted from unsigned integers to signed integers to put them into the proper format for the next stage in the compression process. These 8.times.8 blocks are then forwarded to a discrete cosine transformer 16 via bus 14.
The discrete cosine transformer 16 converts the source image data into transformed image data using the discrete cosine transform (DCT). The DCT, as is known in the art of image processing, decomposes the 8.times.8 block of source image data into 64 DCT elements or coefficients, each of which corresponds to a respective DCT basis vector. These basis vectors are unique 2-dimensional (2D) "spatial waveforms," which are the fundamental units in the DCT space. These basis vectors can be intuitively thought to represent unique images, wherein any source image can be decomposed into a weighted sum of these unique images. The discrete cosine transformer uses the forward discrete cosine (DCT) function as shown below, hence the name. ##EQU1## where: C(k), C(l)=1/.sqroot.2for k, l=0; and C(k), C(l)=1 otherwise
The output of the transformer 16 is an 8.times.8 block of DCT elements or coefficients, corresponding to the DCT basis vectors. This block of transformed image data is then forwarded to a quantizer 20 over a bus 18. The quantizer 20 quantizes the 64DCT elements using a 64-element quantization table 24, which must be specified as an input to the compression engine 10. Each element of the quantization table is an integer value from one to 255, which specifies the stepsize of the quantizer for the corresponding DCT coefficient. The purpose of quantization is to achieve the maximum amount of compression by representing DCT coefficients with no greater precision than is necessary to achieve the desired image quality. Quantization is a many-to-one mapping and, therefore, is fundamentally lossy. As mentioned above, quantization tables have been designed which limit the lossiness to imperceptible aspects of the image so that the reproduced image is not perceptually different from the source image.
The quantizer 20 performs a simple division operation between each DCT coefficient and the corresponding quantization table element. The lossiness occurs because the quantizer 20 disregards any fractional remainder. Thus, the quantization function can be represented as shown in Equation 2 below. ##EQU2## where Y(k,l) represents the (k,l)th DCT element and Q(k,l) represents the corresponding quantization table element.
To reconstruct the source image, this step is reversed, with the quantization table element being multiplied by the corresponding quantized DCT coefficient, but in so doing the fractional part is not restored. Thus, this information is lost forever. Because of the potential impact on the image quality of the quantization step, considerable effort has gone into designing the quantization tables. These efforts are described further below following a discussion of the final step in the JPEG compression technique.
The final step of the JPEG standard is an entropy encoding, which is performed by an entropy encoder 28. The entropy encoder 28 is coupled to the quantizer 20 via a bus 22 for receiving the quantized image data therefrom. The entropy encoder achieves additional lossless compression by encoding the quantized DCT coefficients more compactly based on their statistical characteristics. The JPEG standard specifies two entropy coding methods: Huffman coding and arithmetic coding. The compression engine of FIG. 1. assumes Huffman coding is used. Huffman encoding, as is known in the art, uses one or more sets of Huffman code tables 30. These tables may be predefined or computed specifically for a given image. Huffman encoding is a well known encoding technique that produces high levels of lossless compression. Accordingly, the operation of the entropy encoder 28 is not further described.
Referring now to FIG. 2, a typical JPEG compressed file is shown generally at 34. The compressed file includes a JPEG header 36, the quantization (Q) tables 38 and the Huffman (H) tables 40 used in the compression process, and the compressed image data 42 itself. From this compressed file 34 a perceptually indistinguishable version of the original source image can be extracted when an appropriate Q table is used. This extraction process is described below with reference to FIG. 3.
A JPEG decompression engine 43 is shown in FIG. 3. The decompression engine essentially operates in reverse of the compression engine 10. The decompression engine receives the compressed image data at a header extraction unit 44, which extracts the H tables, Q tables, and compressed image data according to the information contained in the header. The H tables are then stored in H tables 46 while the Q tables are stored in Q tables 48. The compressed image data is then sent to an entropy decoder 50 over a bus 52. The Entropy Decoder decodes the Huffman encoded compressed image data using the H tables 46. The output of the entropy decoder 50 are the quantized DCT elements.
The quantized DCT elements are then transmitted to an inverse quantizer 54 over a bus 56. The inverse quantizer multiplies the quantized DCT elements by the corresponding quantization table elements found in Q tables 48. As described above, this inverse quantization step does not yield the original source image data because the quantization step truncated or discarded the fractional remainder before transmission of the compressed image data.
The inverse quantized DCT elements are then passed to an inverse discrete cosine transformer (IDCT) 57 via bus 59, which transforms the data back into the time domain using the inverse discrete cosine transform. The inverse transformed data is then transferred to block-to-raster translator 58 over a bus 60 where the blocks of DCT elements are translated into a raster string of decompressed source image data.
From the decompressed source image data, a facsimile of the original source image can be reconstructed The reconstructed source image, however, is not an exact replication of the original source image.
As described above, the quantization step produces some lossiness in the process of compressing the data. By carefully designing the quantization tables, however, the prior art methods have constrained the loss to visually imperceptible portions of the image. These methods, and their shortcomings, are described below.
The JPEG standard includes two examples of quantization tables, one for luminance channels and one for chrominance channels. See International Organization for standardization: "Information technology--digital compression encoding of continuous--tones still images--part 1: Requirements and Guidelines, " ISO/IEC IS10918-1, Oct. 20, 1992. These tables are known as the K.1 and K2 tables, respectively. These tables have been designed based on the perceptually lossless compression of color images represented in the YUV color space.
These tables result in visually pleasing images, but yield a rather low compression ratio for certain applications. The compression ratio can be varied by setting a so-called Q-factor or scaling factor, which is essentially a uniform multiplicative parameter that is applied to each of the elements in the quantization tables. The larger the Q-factor the larger the achievable compression rate. Even if the original tables are carefully designed to be perceptually lossless, however, a large Q-factor will introduce artifacts in the reconstructed image, such as blockiness in areas of constant color or ringing in text -scale characters. Some of these artifacts can be effectively cancelled by post-processing of the reconstructed image by passing it through a tone reproduction curve correction stage, or by segmenting the image and processing the text separately. However, such methods easily introduce new artifacts. Therefore, these methods are not ideal.
As a result of the inadequacy of the Q-factor approach, additional design methods for JPEG discrete quantization tables have been proposed. These methods can be categorized as either perceptual, which means based on the human visual system (HVS) or based on information theory criteria. These methods are also designated as being based on the removal of subjective or statistical redundancy, respectively.
These methods form the Q table from two separate terms: a threshold term, chosen depending on the particular method selected, and a bit-rate term, chosen based on the desired compression ratio. This is shown graphically in FIG. 4.
The flowchart in FIG. 4 shows a basic three-step process of forming a Q-table according to prior art methods. First, a so-called threshold term is selected in step 64. The term is referred to as "threshold" because it relates to the HVS thresholds, below which the quantization is perceptually lossless. Second, a bit-rate term is selected in step 66. This is essentially the Q-factor described above. The bit-rate term can be thought of as a normalization factor that allows the control of the bitrate of the compressed image. This factor is important for applications having a limited channel bandwidth (e.g., FAX) or with limited storage space (e.g., digital cameras). Finally, the Q-table is formed in step 68 using both of these terms. Typically, this step just involves dividing a threshold term by a corresponding bit-rate term. Each of these steps is described further below.
Referring now to FIG. 5, a flowchart showing the two methods, statistical and subjective, for determining the threshold term is shown. Both methods begin by selecting a set of typical images in step 70. This set is chosen to represent the desired image characteristics so that the Q-table is optimized for these images. Next, the particular method is chosen. If a statistical or information theoretical method is chosen, steps 74 and 76 are executed, while if a perceptual or subjective method is desired, steps 78 and 80 are instead executed.
In the statistical or information theoretical method, the threshold term is chosen based on the energy within an 8.times.8 block of DCT transformed image data. The energy is typically represented by the variance of the DCT elements across the entire image. The quantization levels are then allocated in abundance to the DCT basis vectors having larger variances. Conversely, those quantization levels having a smaller variance have fewer quantization levels. The problem with these methods is that they do not take into account the HVS and, as a result, the generated solutions may produce reconstructed images of poor visual quality. Some have tried to attenuate this result by combining the statistical method with multi-level error diffusion. In this refined method, the error made in the quantization is diffused to an adjacent DCT element. These refined methods, however, are more computationally intensive and do not fully restore the visual quality of the image.
In the subjective or perceptual methods, an error or difference between the compressed and uncompressed image is adjusted to the threshold of detectability of the human visual system (HVS). This is accomplished in the two steps 78 and 80 shown in FIG. 5. In the first step 78, a difference is computed between the original and compressed images. This difference is used in step 80 to allocate the bandwidth so that the difference image is below the detection threshold. The detection threshold is established using psychophysical experiments. The process of establishing the threshold using the psychophysical experiments further comprises two steps. In the first step, the perceptual color space is used to develop a detection model for a specific class of features by performing the psychophysical experiments. In the second step, this model is applied to predict the visibility thresholds for the quantization errors. The threshold terms are then chosen so that the errors are just below the threshold level.
This subjective or perceptual method, as well as others such as the "contrast masking" method proposed by Robert J. Safranek, "JPEG compliant encoder utilizing perceptually based quantization," Human Vision, Visual Processing, and Digital Display V1913, 117-126, SPIE, 1993, address the task of perceptually lossless compression. For many applications, e.g., color facsimile, the generated solutions are too conservative in that the achievable compression rate is too low. For color facsimile applications, this low compression ratio produces unacceptable transmission times. Furthermore, the images used in the psychophysical experiments to characterize the HVS are sine wave gratings. While these sine wave gratings adequately represent pictorial images, used indiscriminately such targets do not predict well the perception of text. Consequently, these perceptual methods do not effectively control artifacts in text such as ringing around characters.
Hybrid forms of these statistical and subjective methods described above have been proposed which try to take the best of both methods. Although a frequency component might have a high variance, it might not be very significant for contributing information to the HVS. Therefore, for a given image, the image quality can be improved by reallocating the transmission bandwith to perceptually important information while attenuating information not perceptible. Alternatively, for a given perceptual quality, the number of bits required to encode a pixel can be reduced.
An example of such a hybrid method can be found in David L. McLaren and D. Thong Nguyen, "Removal of Subjective Redundancy from DCT-coded Images," IEE Proceedings-I, 138, 5, 345-350, 1991. The McLaren and Nguyen method, instead of using a uniform Q-factor, separates the coefficient thresholding from the quantization stages. By considering the coefficient energy distribution and the contrast sensitivity curve of the HVS, the method harshly thresholds the DCT coefficients, obtaining substantially better bit rates without perceptually degrading the images. This method, however, was designed using video images. One characteristic of video images is the substantial cross-talk in the transducers, which "smoothens" the images. For instance, text in video images is typically large so that this problem can be reduced. In color facsimile communication, there is much less cross talk because hard edges are desirable. Moreover, color scanners typically have problems of sensor misalignment and images are often half-toned, which introduces interference patterns. These artifacts, typical for color facsimile, can adversely bias the statistical analysis of coefficient energy as taught in McLaren and Nguyen.
In another hybrid method proposed by K. R. Rao and his co-workers, (see, e.g., Bowonkoon Chitprasert and K. R. Rao, "Human Visual Weighted Progressive Image Transmission," IEEE Transactions on Communications, COM-38,7,1040-1044, 1990) a perceptual model is used for classification of the data to establish a transmission hierarchy. In a first step, a Q table is generated based on estimates of energy in each 8.times.8 block. In a second step, a modulation transfer function (MTF) is empirically determined for the HVS and the MTF is multiplied by the DCT coefficients at the corresponding frequencies. For progressive transmission, the resulting weighted coefficient sub-blocks are sorted according to their AC energy into classes. As in the McLaren and Nguyen method, however, the work of Rao, et al., focuses on pictorial images at low resolution for either television or low resolution computer video displays. Color facsimile communication takes place at higher resolutions and most of the time contains text. Accordingly, these methods are not particularly well suited for color facsimile data.
A third hybrid method has been proposed by Siegel that has some resemblance to Safranek's method. See U.S. Pat. No. 5,063,608 entitled, "Adaptive Zonal Coder," issued to Shepard L. Siegal. In Siegel's method, just before the entropy encoder, each block is scanned and a running block activity measure, such as the running sum of squares or absolute values, is computed. If there are sufficient low frequency terms, according to Siegel the higher frequency terms are of less importance and, due to the nature of human visual response, may be omitted without perceived image degradation. This method, however, is computationally intensive and thus may make a hardware implementation impractical. Moreover, Siegel's method may not be well suited for typical color facsimile data which typically includes a combination of text and graphics.
Accordingly, a need remains for a method for selecting JPEG quantization tables which is particularly adapted to color facsimile applications.