1. Field of Invention
This invention relates to digital systems for scanning, representing, and reproducing document images. More specifically, the present invention is directed to adaptive quantization within the JPEG sequential mode data syntax. In particular, it relates to a method of segmenting an image into blocks of different image types that can be used to compress the image more efficiently without loss of significant information based on the visual properties of the human eye.
2. Description of Related Art
The JPEG (Joint Photographic Expert Group) architecture can be viewed as a compression method from which various applications can define a compression system that is suitable for their particular needs. JPEG is concerned only with the encoding and decoding of image data--the interpretation of the data is beyond the scope of JPEG and is left to the applications that use it.
The JPEG specification consists of several parts, including protocols for both lossless and lossy compression encoding. The lossless compression algorithm uses a predictive/adaptive model with a Huffman code output stage without the loss of any information. The JPEG lossy compression algorithms, including the standard sequential mode with which this invention is most concerned, operate in several successive stages, as shown in FIG. 1. These steps combine to form a compressor capable of compressing predominantly continuous tone images while losing little of their original fidelity. In general in this application, for simplicity the term "JPEG" used as an adjective will usually refer to JPEG sequential mode data syntax. For example, "JPEG compliant" means "compliant with the JPEG sequential mode data syntax."
Central to the compression process is the Discrete Cosine Transform (DCT) performed on each image plane (e.g., color or luminosity values) of an image. As will be appreciated, there are mono-plane images (e.g., gray images), as well as multi-layer or multi-plane images (e.g., rgb or cmyk images). Therefore, it is to be understood that "image" sometimes is used herein to refer to a single plane of a multi-layer image because essentially the same compression process is performed for each image plane. For example, when a DCT is performed on the 64 values of an 8.times.8 pixel block within any plane of an image, the result is a set of 64 coefficients, representing amplitudes of 64 respective orthogonal waveform components, that together define the values for all 64 pixels in the 8.times.8 pixel block. An inverse DCT performed on the 64 coefficients will reproduce the original 64 values of the 8.times.8 pixel block.
The advantage of using these 64 coefficients instead of the 64 original values is that each coefficient represents the magnitude of an orthogonal waveform representing a different spatial frequency. Smooth textured blocks have low pixel-to-pixel variation, so many zero-value "high-frequency" DCT coefficients are likely. For example, performing a DCT on a block of 64 pixels having identical values will result in one nonzero coefficient and 63 zero value coefficients. Further, if the coefficients are ordered by spatial frequency, longer strings of zero-value coefficients will result.
As one skilled in the art will understand, data with long zero-value strings will enable greater data compression, for example when using Huffman-type entropy encoding. For this reason, when a DCT is computed for a (usually 8.times.8) pixel block, it is desirable to represent the coefficient for high spatial frequencies with less precision. This is done by a process called quantization, illustrated in FIG. 2. Quantization is basically a process for reducing the precision of the DCT coefficients. Precision reduction is extremely important, since lower precision almost always implies greater throughput in the compressed data stream. One reason the JPEG algorithm compresses so effectively is that a large number of coefficients in the DCT block are rounded or truncated to zero value during the quantization stage.
A DCT coefficient is quantized by dividing it by a nonzero positive integer called a quantization value, and truncating or rounding the quotient--the quantized DCT coefficient--to the nearest integer. In order to reconstruct (dequantize) the DCT coefficient, the decoder must multiply it by the quantization value. Since some precision is lost in quantizing, the reconstructed DCT coefficients are approximations of the values before quantization.
Before quantizing, the DCT coefficients are ordered into a one-dimensional vector using the well known zigzag scan sequence as shown in Table 1 below. The lowest frequency component, represented by the coefficient labeled zero, is the DC component. The remaining coefficients are the AC coefficients, and are ordered horizontally and vertically from left to right and top to bottom, respectively, representing increasingly high frequencies. The DC coefficient is coded using a one-dimensional DPCM (Differential Pulse Code Modulation) technique, which converts the current DC coefficient to a difference from the DC coefficient of the previous block, followed by entropy coding. The AC coefficients in the zigzag scan are divided into runs of zero coefficients terminated by nonzero coefficients. Huffman codes are then assigned to each possible combination of zero coefficient run length and magnitude for the next non-zero AC coefficient.
TABLE 1 Zigzag scan index sequence for DCT coefficients 0, 1, 5, 6, 14, 15, 27, 28, 2, 4, 7, 13, 16, 26, 29, 42, 3, 8, 12, 17, 25, 30, 41, 43, 9, 11, 18, 24, 31, 40, 44, 53, 10, 19, 23, 32, 39, 45, 52, 54, 20, 22, 33, 38, 46, 51, 55, 60, 21, 34, 37, 47, 50, 56, 59, 61, 35, 36, 48, 49, 57, 58, 62, 63.
For compressing an image plane of an image, JPEG protocol allows the encoder to embed an 8.times.8 quantization table (Q-table) in the data that will be passed to the decoder. This Q-table can contain different values for quantizing the respective DCT coefficients, chosen so as to minimize perceived distortion in reconstructed images, using principles based on the human visual system. The lowest level of capability for the JPEG sequential mode is the "baseline system." In this system, which is intended to allow a very simple implementation in hardware, no more than one table for each image plane (up to a maximum total of four, regardless of the total number of image planes) can be embedded in the data to be passed to the decoder.
In a typical JPEG baseline sequential technique, illustrated in FIGS. 1-3 source image pixel values of an 8.times.8 pixel block (p.sub.00, P.sub.01, . . . , P.sub.xy, . . . , P.sub.77) 102 are subjected to a discrete cosine transform (DCT) 104F. The resulting DCT coefficients are ordered into a DCT coefficient matrix (S.sub.00, S.sub.01, . . . , S.sub.xy, . . . , S.sub.77) 104 as shown in Table 1 above. Quantization 108F is performed on the DCT coefficients 104, using a Q-table (Q.sub.00, Q.sub.01, . . . , Q.sub.xy, . . . , Q.sub.77) 106 to obtain quantized DCT coefficients (Sq.sub.00, Sq.sub.01, . . . , Sq.sub.xy, . . . , Sq.sub.77) 108, by dividing each S.sub.xy by its corresponding Q.sub.xy and rounding the result to the nearest integer. The quantized DCT coefficients 108 are then encoded by an entropy encoder 110 using Huffman tables 112, and the resulting encoded (compressed) data 114 are transmitted or stored until needed, at which time they are decoded, dequantized, and subjected to an inverse DCT to reconstruct the 8.times.8 pixel block 102 (or an approximation thereof).
Steps for performing JPEG compliant compression are summarized in FIG. 3. In step S302, an image is scanned and pixels are organized into 8.times.8 pixel blocks. At step S304, a discrete cosine transform (DCT) is performed on a block. At step S306, the DCT coefficients are quantized and at step S308, encoding of the pixel block is performed. This process is repeated for all blocks in the image, until JPEG encoding has been performed for the entire image.
JPEG was originally adopted for encoding photographs that typically contain smooth changes from one pixel to the next, but it also can be used for other image types, such as text, which are characterized by sharp pixel-to-pixel variations. However, coarser quantization (i.e., larger quantization values) can be used to improve compression of images characterized by smooth pixel variations, without unduly degrading perceptual image quality, while more fine quantization is required for text. Accordingly, the optimum Q-table for quantization, affording an acceptable balance between image quality and compression, is different for different types of images.
The optimum Q-table varies with image type because an image with very sharp pixel value transitions (e.g., a text image) is much less perceptually forgiving of any reduction in precision. For example, if a coarse quantization Q-table optimal for pictorial images types is used to compress a text image, when decompressed the image is much more likely to include artifacts noticeable to the human eye. Other image types having smoother pixel value transitions, or very detailed images (e.g., a photo of a field of grass) can undergo greater compression (with a corresponding greater loss of precision) without producing artifacts noticeable to the human eye.
Because an optimum Q-table is different for different types of images (text, half-tone, pictorial, etc.), it is possible to choose different Q-tables to be passed to the decoder depending on the type of image being compressed, although in many applications (such as with copiers or printers) this option is undesirable because of the added expense required to implement it. As a result, for example, most copiers are equipped to always use a text-optimized Q-table to ensure the minimum of undesirable artifacts discernible to the human eye in the resulting copy, regardless of the image type of the document being copied. However, it is possible to equip a copier with an "image type" selection feature by which the user can manually select the type of image being copied. This, of course, assumes that the user will always be correct in judging the actual image type of the document. Alternatively, a copier or other image compressing apparatus may include means to automatically determine the image type of each document being copied, and choose an optimal Q-table accordingly.
However, a practical complication arises when a document is composed of different image types. Typical documents may contain a mixture of textual (i.e., sharp edge) and pictorial regions on the same page. For example, a document may contain a photograph with a section of explanatory text beneath it. When a document comprises a number of different image types, and a single Q-table must be chosen for all of these image types, a text-optimized Q-table should be chosen so that high perceptual quality is achieved for the entire image.
Accordingly, it would be advantageous to be able to use image-type optimized Q-tables to quantize the DCT coefficients for image-type characterized blocks. One way to achieve this would be to quantize each block using different Q-table based on its image type and pass the table to the decoder, so that each block can be reconstructed with minimum perceptual error. Such a system must also include a nonstandard decoder that can receive information from the encoder about the quantization table used for each block. Unfortunately, current JPEG compliant decoders cannot do this because, as explained above, baseline JPEG protocol allows only one Q-table per image plane (up to a maximum of four per image), to be passed to the decoder. Thus, using the current sequential JPEG algorithm on a mixed image type document represents a poor compromise between the size of the compressed image and the quality of the image that can be reproduced from it.
Adaptive quantization, if successfully implemented, could significantly improve the image quality achieved at a given rate. With adaptive quantization, a value is passed to the decoder that will cause the decoder to modify the Q-table it is using to dequantize the decoded data. Recently, the JPEG committee passed recommendation T.84 that allows a single scaling factor to be passed to the decoder, by which the decoder will linearly scale all the values in the Q-table. There has not been much effort by the industry to implement this method, because it is generally understood that not much improvement in compression can be achieved using a single scaling factor without unduly degrading image quality. This is because linear scaling equally affects both the high frequency and the low frequency coefficients. However, since perceptual image quality is less affected by changes to high frequency coefficients than to low frequency coefficients, significant improvement in compression without unduly degrading perceptual quality can only be achieved by increasing quantization factors for high frequency coefficients more than for low frequency coefficients.