A. Technical Field of Field of the Invention:
The present invention relates to an apparatus and method for coding images, and more particularly, to an apparatus and method for compressing images to a reduced number of bits by employing a Discrete Cosine Transform (DCT) in combination with visual masking including luminance and contrast techniques as well as error pooling techniques all to yield a quantization matrix optimizer that provides an image having a minimum perceptual error for a given bit rate, or a minimum bit rate for a given perceptual error.
B. Description of the Prior Art:
Considerable research has been conducted in the field of data compression, especially the compression of digital information of digital images. Digital images comprise a rapidly growing segment of the digital information stored and communicated by science, commerce, industry and government. Digital images transmission has gained significant importance in highly advanced television systems, such as high definition television using digital information. Because a relatively large number of digital bits are required to represent digital images, a difficult burden is placed on the infrastructure of the computer communication networks involved with the creation, transmission and re-creation of digital images. For this reason, there is a need to compress digital images to a smaller number of bits, by reducing redundancy and invisible image components of the images themselves.
A system that performs image compression is disclosed in U.S. Pat. No. 5,121,216 of C. E. Chen et al, issued Jun. 9, 1992, and herein incorporated reference. The '216 patent discloses a transform coding algorithm for a still image, wherein the image is divided into small blocks of pixels. For example, each block of pixels may be either an 8.times.8 or 16.times.16 block. Each block of pixels then undergoes a two dimensional transform to produce a two dimensional array of transform coefficients. For still image coding applications, a Discrete Cosine Transform (DCT) is utilized to provide the orthogonal transform.
In addition to the '216 patent, the Discreet Cosine Transform is also employed in a number of current and future international standards, concerned with digital image compression, commonly referred to as JPEG and MPEG, which are acronyms for Joint Photographic Experts Group and Moving Pictures Experts Group, respectively. After a block of pixels of the '216 patent undergoes a Discrete Cosine Transform (DCT), the resulting transform coefficients are subject to compression by thresholding and quantization operations. Thresholding involves setting all coefficients whose magnitude is smaller than a threshold value equal to zero, whereas quantization involves scaling a coefficient by step size and rounding off to the nearest integer.
Commonly, the quantization of each DCT coefficient is determined by an entry in a quantization matrix. It is this matrix that is primarily responsible for the perceived image quality and the bit rate of the transmission of the image. The perceived image quality is important because the human visual system can tolerate a certain amount of degradation of an image without being alerted to a noticeable error. Therefore, certain images can be transmitted at a low bit rate, whereas other images cannot tolerate any degradation and should be transmitted at a higher bit rate in order to preserve their informational content.
The '216 patent discloses a method for the compression of image information based on human visual sensitivity to quantization errors. In the method of '216 patent, there is a quantization characteristic associated with block to block components of an image. This quantization characteristic is based on a busyness measurement of the image. The method of '216 patent does not compute a complete quantization matrix, but rather only a single scaler quantizer.
Two other methods are available for computing DCT quantization matrices based on human sensitivity. One is based on a mathematical formula for human contrast sensitivity function, scaled for viewing distance and display resolution, and is disclosed in U.S. Pat. No. 4,780,761 of S. J. Daly et al. The second is based on a formula for the visibility of individual DCT basic functions, as a function of viewing distance, display resolution, and display luminance. The second formula is disclosed in a first technical article entitled "Luminance-Model-Based DCT Quantization For Color Image Compression" of A. J. Ahumada and H. A. Peterson published in 1992 in the Human Vision, Visual Processing, and Digital Display III Proc. SPIE 1666, Paper 32, and a second technical article entitled "An Improved Detection Model for DCT Coefficient Quantization" of H. A. Peterson, et al., published in 1993, in Human Vision, Visual Processing and Digital Display VI Proc. SPIE. Vol. 1913 pages 191-201 and a third technical article entitled "A visual detection model for DCT coefficient quantization" A. J. Ahumada, Jr. and H. A. Peterson, published in 1993, in Computing in Aerospace 9, American Institute of Aeronautics and Astronautics, pages 314-318.
The methods described in the '761 patent and the three technical articles do not adapt the quantization matrix to the image being compressed, and do not therefore take advantage of masking techniques for quantization errors that utilize the image itself. Each of these techniques has features and benefits described below.
First, visual thresholds increase with background luminance and this feature should be advantageously utilized. However, the formula given in the both referenced technical articles describes the threshold for DCT basic functions as a function of mean luminance. This would normally be taken as the mean luminance of the display. However, variations in local mean luminance within the image will in fact produce substantial variations in the DCT threshold quantities. These variations are referred to herein as "luminance masking" and should be fully taken into account.
Second, threshold for a visual pattern is typically reduced in the presence of other patterns, particularly those of similar spatial frequency and orientation. This reduction phenomenon is usually called "contrast masking." This means that a threshold error in a particular DCT coefficient in a particular block of the image will be a function of the value of that coefficient in the original image. The knowledge of this function should be taken advantage of in order to compress the image while not reducing the quality of the compressed image.
Third, the method disclosed in the two referenced technical articles ensures that a single error is below a predetermined threshold. However, in a typical image there are many errors of varying magnitudes that are not properly handled by a single threshold quantity. The visibility of this error ensemble selected to handle all varying magnitudes is not generally equal to the visibility of the largest error, but rather reflects a pooling of errors over both frequencies and blocks of the image. This pooling is herein term "error pooling" and is beneficial in compressing the digital information of the image while not degrading the quality of the image.
Fourth, when all errors are kept below a perceptual threshold, a certain bit rate will result, but at times it may be desired to have an even lower bit rate. The two referenced technical articles do not disclose any method that would yield a minimum perceptual error for a given bit rate, or a minimum bit rate for a given perceptual error. It is desired that such a method be provided to accommodate this need.
Fifth, since color images comprise a great proportion of images in common use, it is desirable that the above advantages be applied to both grayscale and color images. The referenced technical articles provide a method for computing three quantization matrices for the three color channels of a color images, but do not disclose any method for optimizing the matrix for a particular color image.
Finally, it is desired that all of the above prior art limitations and drawbacks be eliminated so that a digital image may be represented by a reduced number of digital bits while at the same time providing an image having a low perceptual error.
Accordingly, an object of the present invention is to provide a method to compress digital information yet provide a visually optimized image.
Another object of the present invention is to provide a method of compressing a visual image based on luminance masking, contrast masking, and error pooling techniques.
A further object of the present invention is to provide a quantization matrix that is adapted to the individual image being compressed so that either the grayscale or the color image that is reproduced has a minimal perceptual error for a given bit rate, or a minimum bit rate for a given perceptual error.