Image Coding
The demand for electronic services related to pictorial images or other two-dimensional data has grown so rapidly that even the accelerating advance of electronic transmission and storage technologies will not be able to keep pace, unless the electronic data derived from the images can be compressed in a way that does not impair perception of the reconstructed image or other two-dimensional data.
Different compression methods have evolved in the art as understanding of pictorial data has increased and theoretical advances have been made. Differential Pulse Code Modulation (DPCM) and bit-plane coding were among the early methods used, and they achieved compression factors of up to 4-6 by trading image quality for lower bit rate. Pictures with higher quality than obtainable with DPCM, coded with only one bit per pixel, can now be obtained with a number of methods, such as the Adaptive Discrete Cosine Transform (ADCT) described by W. H. Chen and C. H. Smith, in Adaptive Coding of Monochrome and Color Images, Vol. COM-25, IEEE Trans. Comm., 1285-92 (November 1987). In an ADCT coding system, the image is decomposed into blocks, generally eight by eight, and for each of the blocks a DCT (Discrete Cosine Transform) is carried out. The compression is obtained by quantization of the DCT coefficients with variable thresholds, partially optimized for the human visual acumen, followed by variable word length encoding.
Sub-band coding of images has been introduced to picture coding. One arrangement was proposed by J. W. Woods and S. D. O'Neil, in Sub-Band Coding of Images, Vol. 34 No. 5, IEEE ASSP, 1278-88 (October 1986). The arrangement proposed by woods et al. includes a filter bank that divides the image signal into bands of different frequency contents. The signal of each filter output is compressed via DPCM. The compressed signals are then transmitted to a receiver where the process is reversed. Specifically, each signal is DPCM decoded and then up-sampled, filtered, and combined with the other filtered signals to recover the original image.
H. Gharavi and A. Tabatabai in Sub-Band Coding of Images Using Two-Dimensional Quadrature Mirror Filtering, Vol. 707, Proc. SPIE, 51-61 (September 1986) use long complex quadrature mirror filters to obtain a number of frequency band signals. The "low-low" band is DPCM coded using a two-dimensional DPCM codec. A dead-zone quantizer is used for the other bands, followed by PCM coding.
Other sub-band coding schemes such as that proposed by P. H. Westerink, J. W. Woods and D. E. Boekee in Proc. of Seventh Benelux Information theory Symposium 143-50 (1986), apply vector-quantization techniques to code the filter bank outputs.
In the co-pending patent application of H. Bheda and A. Ligtenberg, Ser. No. 222,987, filed Jul. 22, 1988, and assigned to the assignee hereof, the data redundancies in the different sub-band signals are employed to achieve additional data compression. In fact, that technique provides an excellent "front end" for image processing based on sub-band analysis techniques.
There remains the problem of quantizing the analyzed information more effectively in terms of bits per pixel and perceived quality of a reconstructed image. We have determined that the existing versions of the Discrete Cosine Transform do not take full advantage of all facets of the known properties of human visual perception.
Some recent work has addressed this problem. See the article by King N. Ngan et al., Cosine Transform Coding Incorporating Human Visual System Model, SPIE Vol. 707, Visual Communications and Image Processimg, 165-71 (1986), particularly addressing contrast sensitivity. The contrast sensitivity is applied to the quantization process in a very restricted fashion; and other relevant parameters are not applied. Indeed, a kind of pre-emphasis is applied before quantization, apparently in preference to a more precise degree of control over the quantization process.
Image Halftoning
Digital Halftoning, sometimes referred to as "spatial dithering", is the process of creating a binary approximation to a sampled gray scale image. See for example, R. Ulichney, Digital Halftoning, MIT Press, 1987. Sampled gray scale values are typically quantized to have one of a discrete number of values, e.g., 256 or 1024 values. The basic idea in digital halftoning is to replace these quantized picture elements (pixels) from a region of the gray-scale image having an average value of x (where 0=white and 1=black) with a binary pattern of 1s and 0s. In accordance with one halftoning technique, the fraction of resulting 1s is approximately x. The binary pattern is then conveniently used with a display device such as a CRT display or a printer to produce the values for the pixels in the gray-scale halftone image. If the 1s and 0s are supplied to a printer where the 1s are printed as black spots and the 0s are left as white spaces, and if the spots and spaces are sufficiently close together, the eye averages the black spots and white spaces to perceive, approximately, gray level x. In so perceiving the image the eye exhibits a low-pass filtering characteristic. The number of gray-scale samples (pixels) is advantageously equal to the number of bits in the binary pattern.
Recent years have witnessed increasing demand for digital storage and transmission of gray-scale images. In part, this is due to the increasing use of laser printers having a resolution of, e.g., 300 spots (dots) per inch, to produce halftone approximations to gray-scale images such as photographs, art work, design renderings, magazine layouts, etc. The conventional approach to achieving high quality halftone images is to use a high resolution printer. However, it can be shown that the printer resolution required for transparent halftoning with prior art techniques is of the order of 1400 dots/inch. Such printers are often slow and expensive.
Many prior art halftoning techniques assume that the black area of a printed binary pattern is proportional to the fraction of ones in the pattern. This means that the area occupied by each black dot is roughly the same as the area occupied by each white dot. Thus, the "ideal" shape for the black spots produced by a printer (in response to 1's) would be T.times.T squares, where T is the spacing between the centers of possible printer spots. However, most practical printers produce approximately circular spots. It is clear, therefore, that the radius of the dots must be at least T/.sqroot.2 so that an all-ones binary pattern is capable of blackening a page entirely. This has the unfortunate consequence of making black spots cover portions of adjacent spaces, causing the perceived gray level to be darker than the fraction of ones. Moreover, most printers produce black spots that are larger than the minimal size (this is sometimes called "ink spreading"), which further distorts the perceived gray level. The most commonly used digital halftoning techniques (for printing) protect against such ink spreading by clustering black spots so the percentage effect on perceived gray level is reduced. Unfortunately, such clustering constrains the spatial resolution (sharpness of edges) of the perceived images and increases the low-frequency artifacts. There is a tradeoff between the number of perceived gray levels and the visibility of low-frequency artifacts.
Other distortions that can occur in commonly used laser printers, such as the Hewlett-Packard line of laser printers, include the peculiar characteristic that a white line surrounded by several black lines appears brighter than when surrounded by two single lines. These cause further distortions to the perceived gray levels.
Block replacement is one commonly used halftoning technique used to improve perceived and gray-scale images. Using this technique, the image is subdivided into blocks (e.g. 6.times.6 pixels) and each block is "replaced" by one of a predetermined set of binary patterns (having the same dimensions as the image blocks). Binary patterns corresponding to the entire image are then supplied to a printer or other display device. Typically, the binary patterns in the set have differing numbers of ones, and the pattern whose fraction of ones best matches the gray level of the image block is selected. This block replacement technique is also referred to as pulse-surface-area modulations. See the Ulichney reference, supra, at pg. 77.
In another halftoning technique known as screening, the gray scale array is compared, pixel by pixel, to an array of thresholds. A black dot is placed wherever the image gray level is greater than the corresponding threshold. In the so called random dither variation of this technique, the thresholds are randomly generated. In another variation, ordered dither, the thresholds are periodic. More specifically, the threshold array is generated by periodically replicating a matrix (e.g., 6.times.6) of threshold values.
A technique known as error diffusion is used in non-printer halftone display contexts to provide halftoning when ink spreading and other distortions common to printers are not present. See, for example, R. W. Floyd and L. Steinberg, "An Adaptive Algorithm for Spatial Grey Scale," Proc. SID, Vol. 17/2, pp. 75-77, 1976.
Like most of the known halftoning schemes, error diffusion makes implicit use of the eye model. It shapes the noise, i.e., the difference between the gray-scale image and the halftone image, so that it is not visible by the eye. The error diffusion technique produces noise with most of the noise energy concentrated in the high frequencies, i.e., so-called blue noise. Thus, it minimizes the low-frequency artifacts. However, since error diffusion does not make explicit use of the eye model, it is not easy to adjust when the eye filter changes, for example with printer resolution, or viewer distance. Error diffusion accomplishes good resolution by spreading the dots. It is thus very sensitive to ink-spreading, in contrast to the clustered dot schemes like "classical" screening. In the presence of ink spreading, error diffusion often produces very dark images, therefore limiting its application to cases with no ink-spreading.
Model-based halftoning approaches have been described generally in the context of printed images. For example, Anastassiou in the paper, "Error Diffusion coding for A/D Conversion,", IEEE Trans. Cir. Sys., Vol. CAS-36, No. 9, pp. 1175-1186, September 1989 proposes a "frequency weighted squared error criterion" which minimizes the squared error between the eye-filtered binary and the eye-filtered original gray-scale image. He considers the problem intractable and suggests an approximate approach based on neural networks. Moreover, the disclosed techniques assume perfect printing, i.e., printing without distortion. Allebach, in the paper "Visual Model-Based Algorithms for Halftoning Images," Proc. SPIE, Vol. 310, Image Quality, pp. 151-158, 1981, proposes a visual model to obtain a distortion measure that can be minimized, but provides no complete approach to achieve halftoning.
Roetling and Holladay, in the paper "Tone Reproduction and Screen Design for Pictorial Electrographic Printing," Journal of Appl. Phot. Eng., Vol. 15, No. 4, pp. 179-182, 1979, propose an ink-spreading printer model, of the same general type used in the present invention, but uses it only to modify ordered dither so that it results in a uniform gray scale. Since ordered dither produces a fixed number of apparent gray levels, this technique cannot exploit ink spreading to generate more gray levels.