Conventional digital images represent a visual scene using a relatively large amount of data. Visual scenes are usually digitized in a pixel grid of rows and columns, with each pixel allocated a fixed number of bits to represent gray shade or color. For example, a typical personal computer screen can display an image 1024 pixels wide, 768 pixels high, with 16 bits allocated for each pixel to display color--a single such image requires over 12.5 million bits of storage. If this same screen were used to display digital video at 60 frames per second, the video would require a data rate of 755 million bits per second-roughly the combined data rate of 12,000 conventional telephone conversations. Digital image technology now extends, and will continue to be extended, into applications where data volumes such as those exemplified above are undesirable, and in many instances, unworkable.
Most digital images must be compressed in order to meet transmission bandwidth and/or storage requirements. Lossless image coders generally seek out redundancies in image data (e.g., spatial, intensity, or temporal correlation) that can be coded more efficiently without loss of information content. Compression gains with lossless coders are generally modest. Lossy coders throw away part of the full precision image data during compression. Although many lossy image coders can produce images and videos compressed to only a fraction of a bit per pixel, the quality of a reconstructed lossy-compressed image at a given compression rate may vary greatly from coder to coder.
Some lossy coders transform an image before compressing it. The transform step in a coder (hopefully) allows the coder to better rank the significance of image information content. The transform coder then keeps only what it determines to be more significant transformed image information, and discards the remainder. An inverse transform later reconstructs the image from the partial transform data.
Different transforms parse image information in different ways. A discrete cosine transform (DCT) represents an image in terms of its sinusoidal spatial frequency. A discrete wavelet transform (DWT) represents an image using coefficients representing a combination of spatial location and spatial frequency. Furthermore, how well a DWT parses location and frequency information on a given image depends on the particular wavelet function employed by the DWT. For instance, the Haar wavelet function efficiently codes text and graphics regions, while the 9-7 tap Daubechies wavelet function performs well for coding natural images.