Digital images are often too large to conveniently handle and transmit from one party to another. For example, a moderate resolution image such as a 16-bit color image acquired using a 1024×768 pixel resolution occupies 4.5 MB. Over a 56 Kbits/sec modem connection, such an image would take 11 minutes to download. Therefore, images are commonly compressed in accordance with, and stored in the form specified by, standard compression protocols. Some protocols in common use, for example, those resulting in the GIF and some types of TIFF file formats, are lossless. In those formats, the compressed image does not lose any information compared to the original image. However, for many applications the degree of compression offered by lossless compression is not enough. In such cases, a higher degree of compression can be obtained by using lossy compression, which discards some information in the original picture in order to achieve greater compression. Thus, the most commonly used format (JPEG) is lossy. Other examples of lossy formats include JFIF and MPEG (for video).
We will illustrate the occurrence of lossiness using JPEG, which is perhaps the best known and most commonly used file compression standard. JPEG involves 5 basic steps: (1) parsing an image into macroblocks; (2) transforming the image into the frequency domain using a discrete cosine transform (DCT); (3) quantizing the DCT coefficients; (4) run-length encoding the quantized DCT coefficients; and (5) variable-length encoding the result.
In step (1), the image (a matrix of pixel values) is decomposed into 8×8 blocks known as macroblocks.
In step (2), each macroblock is operated on by a DCT module to yield an equivalent 8×8 block of frequency domain coefficients. DCT is simply the counterpart, in the digital domain, to the fourier transform in the analog domain.
In step (3), the coefficients of each macroblock are quantized in a process that typically involves division by an integer and rounding off. For example, if the divisor is 10, values 1049, 1000 and 951 would divide to 104.9, 100.0 and 95.1, which might (depending on the desired degree of rounding) all round to 100. It is this quantizing step that leads to the lossy nature of JPEG.
In step (4), the quantized values are further (but losslessly) compressed using run-length encoding (RLE). RLE is a technique for reducing redundancy in a string of information. For example, in a conceptual illustration, a string such as 77777777222222 might be represented as the shorter string 7(8)2(6), where each parenthetical value in parenthesis represents the number of repetitions of the preceding value.
In step (5), the RLE string is further (losslessly) compressed by a process called variable-length encoding (VLE). In VLE, the relative frequencies of occurrence of each element in the string to be encoded are determined, and more frequently occurring strings are encoded using shorter codes. For example, in a conceptual illustration, in a string where 7(8) occurred most frequently and 2(6) occured the least frequently, 7(8) might be represented using a single digit code, while 2(6) might be represented using multiple digits.
Finally, the VLE string is written to storage.
The foregoing assumes that the image consists of one value per pixel. This may be true for monochromatic or grayscale images but, in general, images will be comprised of three color values per pixel location. For example, the image might be in a red-green-blue (R,G,B) color space, or in a luminance +2 chrominance (Y,Cb,Cr) color space. For color images steps (1)–(5) are typically performed on each color component separately. As a matter of convenience, we will refer to images without differentiating whether such images are grayscale or color, with the associated processing being understood to apply to each color component thereof in the case of color images.
In order to use a compressed image, the encoded image is first retrieved, then decoded using the mathematical inverse of the encoding protocol. For example, in JPEG, the DCT coefficients are recovered, then converted back to the spatial domain using an inverse discrete cosine transform (IDCT). Once converted back to the spatial domain, the image can be displayed and/or manipulated using well-known and widely commercially available image editing software such as Adobe Photoshop, as well as many others.
As illustrated above, each time an image is encoded and stored using a lossy compression protocol, some information is irreversibly lost. Thus, when a lossy encoded image is decoded, the image becomes visually degraded to some degree. Depending on the degree of loss, this degradation may or may not be apparent to the viewer.
However, the effect of several such individual degradations can become cumulative, depending on how the image is used. For example, suppose that a user acquires an image using a digital camera which encodes it using JPEG on a memory card. At this stage, the degradation might only be minimal. Later, the user downloads the image from the memory card onto his computer, and loads (decodes) it for viewing. At that point, the user decides to edit the image (for example, by cropping), then prints it and stores (reencodes) it onto his hard disk. The reencoding process introduces a further degradation relative to the original image seen at the digital camera's sensor. Upon printing, the user sees that one of the people in the picture suffers from red-eye (or green-eye in case of animals), so he loads (decodes) the image, uses his software to fix the problem, and stores (reencodes) it. The reencoding will again introduce further degradation.
As shown by this simple example, even a relatively common sequence of simple edits can result in substantial degradation if the image is reencoded between successive edits.
To use a crude analogy, the original image can be viewed as a sheet of paper, which is compressed (encoded) by crumpling the sheet into a ball. Each time the paper is smoothed out (decoded—say for editing) and then recrumpled (reencoded), the degradation becomes worse.
Of course, users can take steps to reduce such cumulative degradation. For example, it is often recommended to avoid saving (encoding) an image until all edits are finished, or to save intermediate versions using lossless formats such as TIFF or GIF. However, these techniques are often inconvenient, for example, when all editing cannot be performed in a single session, or cannot be anticipated. Also, sometimes images are to be edited by multiple persons at different times and different locations, where bandwidth constraints on transmission discourage the use of lossless file formats for intermediate versions.