This invention relates to the compression of digitized color images with applications to transmission, reception and storage. Uncompressed video images comprise large amounts of information. Compression of digital images improves efficiency of storage and transmission.
In a digital image, a representation of an image is stored and transmitted as an array of numerical values. The image is divided up into a grid. Each small square in the grid is referred to as a pixel. The intensity of the image at each pixel is translated into a numerical value which is stored in an array. The array of numerical values representing an image is referred to as an image plane.
Black and white (gray scale) images are commonly represented as a two-dimensional array where the locations of pixel values in the array correspond to the location of the pixel in the image. Each location in the array for gray scale images can commonly store a number, for example, an integer value of between 0 and 255 (an 8-bit binary number). This means that there can be 1 of 256 different gray levels displayed at each pixel in the image.
Color images are commonly represented by three two-dimensional arrays. Each array (or plane) represents one of the primary colors, e.g., red, green, or blue. The planes overlap so that each pixel in the displayed image displays a composite of a red, green, and blue value at that pixel. In a common 24-bit color system, each pixel in each of the three planes can store a value of between 0 and 255. This means that there can be 1 of 256.sup.3 or 16 million different colors displayed at each pixel. Typical digital color images can range in size from 10.sup.7 bits/image (a TV frame) to 10.sup.10 bits/image (a satellite image) thus posing problems for efficient storage and transmission.
In practice the number of bits required to represent the information in realistic digital images may be greatly reduced without significant loss in perceived quality by taking advantage of the fact that in ordinary images the pixel values tend to be strongly redundant in three domains: spectral (because pixel values from different spectral bands-e.g., RGB--are generally highly correlated); spatial (because neighboring pixels also tend to be highly correlated); and, for dynamic images, temporal (because consecutive frames tend to be very similar). Image compression techniques can reduce the number of bits required to represent images by removing these redundancies. There is a wide variety of such techniques, but they can be divided into two classes: lossless and lossy.
In lossless compression the image reconstructed after compression is numerically identical, pixel by pixel, to the original image. The criteria for comparison of lossless techniques are based on objective measures such as compression ratio, compression speed, and computational complexity. In lossy compression, the reconstructed image is degraded with respect to the original in order to attain higher compression ratios than those of lossless procedures. The degradation may or may not be apparent to a human viewer, but even when noticeable it may be acceptable in some applications although not in others. The criteria for visual quality of compressed images are diverse and subjective; thus, caution must be exercised in comparing lossy compression schemes. This invention is directed to lossy compression of full-color images, although it will be apparent to one of ordinary skill in the art that the techniques utilized in this invention can be used in general for multi-spectral data sets that are highly correlated and are intended for human viewing.
A widely used current standard for still-image compression is a baseline system specified by the published and generally available works of the Joint Photographic Experts Group (JPEG). The JPEG system is essentially single-plane, or monochrome. Thus, for color (or, more generally, for multispectral) images, the JPEG standard encodes each component independently. This is seldom optimal, however, because in most cases there is significant correlation in the information contained in different planes of the same image. In the case of color images intended for human viewing, JPEG suggests that the original RGB image components be decorrelated by linear transformation into YIQ planes where Y represents luminance or the black and white component of the image, and I and Q represent the chrominance or color components which are typically subsampled. After this transformation the JPEG standard performs a spatial frequency compression of the Y and of the sub-sampled I and Q components independently to compress the data. This compression occurs as follows. First, the three planes are divided into blocks of 8.times.8 pixels and a discrete cosine transform (DCT) is computed independently for each block. Second, the coefficients of the transformed blocks are weighted in accordance with the number of bits allocated by a Quantization Matrix for each spatial frequency; independent Quantization Matrices are used for the luminance (Y) and chrominance (I and Q) planes, but the same matrix is used throughout each plane. Third, code-length (Huffman) encoding is applied to the quantized coefficients. Decompression follows an inverse procedure to obtain an RGB color image according to standard techniques known to the art.
A major disadvantage of this approach for digital images is the required translation from an RGB representation to a YIQ representation. The YIQ system was designed to reduce the amount of analog information required to be broadcast for color TV by translating the RGB color signals into the three signals Y, I, and Q. Because human vision has much more spatial resolution sensitivity to the Y (or luminance) component than to the I and Q (or chrominance) components, a very acceptable color picture can be broadcast by assigning most of the bandwidth available in a TV channel to the luminance signal.
While such a translation has worked well for analog color TV broadcasting, it poses at least one major disadvantage for digital color systems. In analog signal processing, multiplication is a very simple and fast process. However, in digital processing, multiplication tends to be complex, slow, and expensive. The conversion processes from RGB to YIQ are based on transformations of the form: ##EQU1## where the 3 by 3 matrices are inverses of each other. Determining the Y, I, and Q components corresponding to the R, G, and B components at any pixel location requires nine multiplications per pixel. The same is true with regard to the reverse transformation, i.e., a requirement in general of nine multiplications per pixel. This represents a significant computational burden for practical systems.
Some prior art digital color image systems have incorporated CCD type cameras with a mosaic color filter covering the CCD arrays. These cameras, by their inherent nature, produce a representation of an image that contains just one color component at every pixel. The arrangement of the components is determined by the mosaic pattern in the filter. Thus far, these prior art systems have not directly compressed or transmitted the multiplexed image produced by the mosaic filter pattern but have instead converted the multiplexed RGB image produced by the filter pattern to a YIQ or CMY type of system before processing. What is done in the present invention is to operate on the RGB signals directly, preserving color resolution as well as spatial resolution while reducing the number of bits required for storage and/or transmission of a bit-mapped image.