1. Field of the Invention
The invention relates to still and moving image compression techniques.
2. Description of Related Art
Digitized images require a notoriously large amount of storage space to store and a notoriously large amount of bandwidth to transmit. A single, relatively modest-sized image, having 480 by 640 pixels and a full-color resolution of 24 bits per pixel (three 8-bit bytes per pixel), occupies nearly a megabyte of data. At a resolution of 1024 by 768 pixels, a 24-bit color screen requires 2.3 megabytes of memory to represent. A 24-bit color picture of an 8.5 inch by 11 inch page, at 300 dots per inch, requires as much as 25 megabytes to represent.
Video images are even more data intensive, since it is generally accepted that for high-quality consumer applications, images must occur at a rate of at least 30 frames per second. Current proposals for high-definition television (HDTV) call for as many as 1920 by 1035 or more pixels per frame, which translates to a data transmission rate of about 1.5 billion bits per second. This bandwidth requirement can be reduced somewhat if one uses 2:1 interleaving and 4:1 decimation for the U and V chromonance components, but 0.373 billion bits per second are still required.
Traditional lossless techniques for compressing digital image and video information, such as Huffman encoding, run length encoding and the Lempel-Ziv-Welch algorithm, are far from adequate to meet this demand. For this reason, compression techniques which can involve some loss of information have been devised, including discrete cosine transform (DCT) techniques, adaptive DCT (ADCT) techniques, and wavelet transform techniques. Wavelet techniques are discussed in DeVore, Jawerth and Lucier, "Image Compression Through Wavelet Transform Coding", IEEE Transactions on Information Theory, Vol. 38, No. 2, pp. 719-746 (1992); and in Antonini, Barlaud, Mathieu and Daubechies, "Image Coding Using Wavelet Transform", IEEE Transactions on Image Processing, Vol. 1, No. 2, pp. 205-220 (1992), both incorporated by reference herein.
The Joint Photographic Experts Group (JPEG) has promulgated a standard for still image compression, known as the JPEG standard, which involves a DCT-based algorithm. The JPEG standard is described in a number of publications, including the following incorporated by reference herein: Wallace, "The JPEG Still Picture Compression Standard", IEEE Transactions on Consumer Electronics, Vol. 38, No. 1, pp. xviii-xxxiv (1992); Purcell, "The C-Cube CL550 JPEG Image Compression Processor", C-Cube Microsystems, Inc. (1992); and C-Cube Microsystems, "JPEG Algorithm Overview" (1992).
An encoder using the JPEG algorithm has four steps: linear transformation, quantization, run-length encoding (RLE), and Huffman coding. The decoder reverses these steps to reconstitute the image. For the linear transformation step, the image is divided up into 8*8 pixel blocks and a Discrete Cosine Transform is applied in both spatial dimensions for each block. The purpose of dividing the image into blocks is to overcome a deficiency of the DCT algorithm, which is that the DCT is seriously nonlocal. The image is divided into blocks in order to overcome this nonlocality by confining it to small regions, and doing separate transforms for each block. However, this compromise has a disadvantage of producing a tiled appearance (blockiness) upon high compression.
The quantization step is essential to reduce the amount of information to be transmitted, though it does cause loss of image information. Each transform component is quantized using a value selected from its position in each 8*8 block. This step has the convenient side effect of reducing the abundant small values to zero or other small numbers, which can require much less information to specify.
The run-length encoding step codes runs of same values, such as zeros, in items identifying the number of times to repeat a value, and the value to repeat. A single item like "8 zeros" requires less space to represent than a string of 8 zeros, for example. This step is justified by the abundance of zeros that usually result from the quantization step.
Huffman coding translates each symbol from the run-length encoding step into a variable-length bit string that is chosen depending on how frequently the symbol occurs. That is, frequent symbols are coded with shorter codes than infrequent symbols. The coding can be done either from a preset table or one composed specifically for the image to minimize the total number of bits needed.
Similarly to JPEG, the Motion Pictures Experts Group (MPEG) has promulgated two standards for coding image sequences. The standards are known as MPEG I and MPEG II. The MPEG algorithms exploit the common fact of relatively small variations from frame to frame. In the MPEG standards, a full image is compressed and transmitted only once for every 12 frames. The JPEG standard is typically used to compress these "reference" or "intra" frames. For the intermediate frames, a predicted frame is calculated and only the difference between the actual frame and the predicted frame is compressed and transmitted. Any of several algorithms can be used to calculate a predicted frame, and the algorithm is chosen on a block-by-block basis depending on which predictor algorithm works best for the particular block. Motion detection can be used in some of the predictor algorithms. MPEG I is described in detail in International Standards Organization (ISO) CD 11172, incorporated by reference herein in its entirety.
Accordingly, for compression of video sequences, the MPEG technique is one which treats the compression of reference frames substantially independently from the compression of intermediate frames between reference frames. The present invention relates primarily to the compression of still images and reference frames for video information, although aspects of the invention can be used to accomplish video compression even without treating reference frames and intermediate frames independently.
The JPEG standard achieves still image compression ratios of about 10:1 to 20:1 or more, depending on the image and the user's standard for acceptable quality. While this is better than the compression ratios of standard lossless techniques, it is still inadequate considering the huge numbers of still and moving images which are likely to require storage and transmission in the near future. Wavelet-based compression techniques generally achieve compression ratios which are better than those achieved by DCT-based techniques such as JPEG, but they are still inadequate.
Other techniques exist for compressing still images, involving the separation of, and separate coding of, different types of information from the original image. These separate codings allow the coding techniques to be optimized for the type of information. In Ran and Farvardin, "Adaptive DCT Image Coding Based on a Three-Component Image Model", 1992 IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 3, pp. 201-204 (1992), incorporated herein by reference, there is described a three-component technique in which what is referred to as "strong edge" information from a still image is encoded separately from the sum of a smooth portion of the image and a textured portion of the image.
The above techniques for compressing digitized images represent only a few of the techniques that have been devised. However, none of the known techniques yet achieve compression ratios sufficient to support the huge still and video data storage and transmission requirements expected in the near future. The techniques also raise additional problems, apart from pure compression ratio issues. In particular, for real time, high-quality video image decompression, the decompression algorithm must be simple enough to be able to produce 30 frames of decompressed images per second. The speed requirement for compression is often not as extreme as for decompression, since for many purposes, images can be compressed in advance. Even then, however, compression time must be reasonable to achieve commercial objectives. In addition, many applications require real time compression as well as decompression, such as real time transmission of live events. Known image compression and decompression techniques which achieve high compression ratios, often do so only at the expense of requiring extensive computations either on compression or decompression, or both.
Accordingly, there is an urgent need for a new image compression/decompression technique which achieves high compression ratios without sacrificing quality, and does so with a reduced requirement for extensive computations.