The present invention relates to the compression and decompression of digital data corresponding to an image, and in particular to a method and a system for lossy compression and decompression of digital data corresponding to a still image, which gains an improved compression ratio through exploitation of the differential spatial sensitivity of the human eye.
Digital image data can be stored on electronic storage devices and displayed by computer display devices. Such data is conveniently transmitted through networks, such as the Internet. For example, Web pages frequently include one or more graphic images, which are transmitted and displayed as digital image data. Unfortunately, each such graphic image forms a large digital file, which requires a large amount of bandwidth to transmit. A large amount of digital image data is required in order to represent the graphic image. A digital representation of a single color image, at the resolution level of a television picture, contains on the order of one million bytes. Thus, the image data must be compressed as much as possible, for more convenient and efficient storage and transport of the data.
The transformation from an image to a computer digital file basically involves the following steps. First, the image is digitized to produce a numeric matrix of a predetermined, known number of pixels, usually in a 24 bit format. In this format, each 8 bits represents a color component. The matrix is then compressed by an encoder, using one of several known compression methods, in which a mathematical transformation compresses the data into a much smaller file than the original matrix. In reconstructing the image, the compressed file is processed by a decoder, with an inverse transformation to retrieve the original matrix and reconstruct the image on a graphic image display device. If the reversed process yields a matrix identical to the original one than the compression method employed is considered to be xe2x80x9closslessxe2x80x9d. However, if the reconstructed matrix is not identical, due to a loss of data during the process, then the compression method employed is considered to be xe2x80x9clossyxe2x80x9d.
An internationally acclaimed standard compression method (and subsequently format standard) is the JPEG (Joint Photographic Expert Group) compression method. The JPEG method is a widely recognized standard for continuous-tone, multi-level still images. This standard was intended to support a large variety of applications for continuous-tone images. JPEG itself actually introduced two basic compression methods, in order to meet the differing needs of many applications: a DCT-based lossy compression method, and a predictive lossless compression method.
The JPEG lossy compression method (see Wallace G.K., xe2x80x9cThe JPEG Still Picture Compression Standardxe2x80x9d, IEEE Transactions on Consumer Electronics, Dec. 1991) is performed as follows, and is shown in background art FIGS. 1 and 2. At the input to the encoder, the bits of the source image samples are grouped into blocks of 8xc3x978, shifted from unsigned integers with range [0, 2Pxe2x88x921], to signed integers with range [xe2x88x922Pxe2x88x921, 2Pxe2x88x921xe2x88x921], and input to the Forward Discrete Cosine Transform (FDCT). The DCT is related to the Discrete Fourier Transform (DFT).
Each of the 8xc3x978 blocks of source image is effectively a 64-point discrete signal which is a function of the two spatial dimensions x, y. The FDCT takes the signal as its input and decomposes the signal into 64 orthogonal base signals. Each contains one of the 64 unique two dimensional (2D) xe2x80x9cspatial frequenciesxe2x80x9d, which comprise the xe2x80x9cspectrumxe2x80x9d of the input signal. The output of the FDCT is the set of 64 base-signal amplitudes or xe2x80x9cDCT coefficientsxe2x80x9d whose values are uniquely determined by the particular 64-point input signal. The DCT coefficient values can be regarded as the relative amount of the 2D spatial frequencies contained in the 64-point input signal. The coefficient with zero frequency in both dimensions is called the xe2x80x9cDC coefficientxe2x80x9d and the remaining 63 coefficients are called xe2x80x9cAC coefficientsxe2x80x9d.
Based on the assumption that sample values typically vary slowly from point to point across an image, the FDCT processing step lays the foundation for archiving data compression by concentrating most of the signal in the lower spatial frequencies. For a typically 8xc3x978 sample block from a typical source image, most of the spatial frequencies have zero or near-zero values and need not to be encoded.
After output from the FDCT, each of 64 DCT Coefficients is uniformly quantized in conjunction with a 64-element Quantization Table, which must be specified by the software application as an input to the encoder. Each element may be any integer value ranging from 1 to 256,which specifies the step size of the quantizer for its corresponding DCT coefficient. The purpose of quantization is to achieve further compression by representing DCT coefficients with the minimal precision which is necessary to achieve the desired image quality. Therefore, information which is not visually significant is discarded. Quantization is thus fundamentally a lossy process, and in fact is the principal source of data loss in DCT-based encoders.
After quantization, the DC coefficient is treated separately from the other 63 AC coefficients. The DC coefficient is a measure of the average value of the 64 image samples. Because there is usually a strong correlation between the DC coefficients of adjacent 8xc3x978 blocks, the quantized DC coefficient is encoded as the difference from the DC term of the previous block in the encoding sequence.
Finally, all the quantized coefficients are ordered into a zig-zag sequence, which helps to facilitate entropy coding by placing low-frequency coefficients, with a higher probability of being non-zero, before high-frequency coefficients.
The final DCT-based encoder processing step is entropy coding. This step achieves additional but lossless compression by encoding the quantized DCT coefficients more compactly, according to their statistical characteristics.
Two preferred entropy methods are used in JPEG: Huffman coding and arithmetic coding. The baseline encoder uses Huffman coding, but encoders with both methods are specified for all modes of operation. Essentially entropy coding converts the zig-zag sequence of quantized coefficients into an intermediate sequence of symbols. Then the symbols are converted into a data stream in which the symbols no longer have externally identifiable boundaries to form the compressed image data.
All of these lossy compression methods and improvements attempt to exploit various properties of the human eye and visual perceptual system in order to achieve further compression without any visible error. In fact, after compression with a lossy method, the compressed image is clearly different from the original image when analyzed mathematically. Preferably, these differences are at least not immediately visible to the naked eye. Thus, the compression method is able to achieve even greater compression ratios without visibly altering the quality of the image.
An even more efficient lossy compression method would exploit several physiological idiosyncrasies of the human visual system. First, the eye and the brain are more sensitive to detail found in the darker areas of an image, and less responsive to changes in areas of high light intensity. Second, the human perceptual system is quickly able to detect errors in adjacent pixels which should be the same color. However, when the color significantly changes from one pixel to another, such as at the edge between two objects, the compression algorithm can represent the color of each pixel less accurately without detection of the error by the human visual system, which xe2x80x9ccorrectsxe2x80x9d the edge colors by using information from surrounding pixels. Third, the human eye is less sensitive to differences in gray-scale levels than the degree of resolution supported by the JPEG compression method.