Transform coding is the name given to a wide family of techniques for data coding, in which each block of data to be coded is transformed by some mathematical function prior to further processing. A block of data may be a part of a data object being coded, or may be the entire object. The data generally represent some phenomenon, which may be for example a spectral or spectrum analysis, an image, an audio clip, a video clip, etc. The transform function is usually chosen to reflect some quality of the phenomenon being coded; for example, in coding of audio, still images and motion pictures, the Fourier transform or Discrete Cosine Transform (DCT) can be used to analyze the data into frequency terms or coefficients. Given the phenomenon being coded, there is generally a concentration of the information into a few frequency coefficients. Therefore, the transformed data can often be more economically encoded or compressed than the original data. This means that transform coding can be used to compress certain types of data to minimize storage space or transmission time over a communication link.
An example of transform coding in use is found in the Joint Photographic Experts Group (JPEG) international standard for still image compression, as defined by ITU-T Rec. T.81 (1992) ISO/IEC 10918-1:1994, Information technology—Digital compression and coding of continuous-tone still images, Part 1: Requirements and Guidelines. Another example is the Moving Pictures Experts Group (MPEG) international standard for motion picture compression, defined by ISO/IEC 11172:1993, Information Technology—Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbits/s. This MPEG-1 standard defines systems for both video compression (Part 2 of the standard) and audio compression (Part 3). A more recent MPEG video standard (MPEG-2) is defined by ITU-T Rec. H.262 ISO/IEC 13818-2: 1996 Information Technology—Generic Coding of moving pictures and associated audio—Part 2: video. A newer audio standard is ISO/IEC 13818-3: 1996 Information Technology—Generic Coding of moving pictures and associated audio—Part 3: audio. All three image international data compression standards use the DCT on 8×8 blocks of samples to achieve image compression. DCT compression of images is used herein to give illustrations of the general concepts put forward below; a complete explanation can be found in Chapter 4 “The Discrete Cosine Transform (DCT)” in W. B. Pennebaker and J. L. Mitchell, JPEG: Still Image Data Compression Standard, Van Nostrand Reinhold: New York, (1993).
Wavelet coding is another form of transform coding. Special localized basis functions allow wavelet coding to preserve edges and small details. For compression the transformed data is usually quantized. Wavelet coding is used for fingerprint identification by the FBI. Wavelet coding is a subset of the more general subband coding technique. Subband coding uses filter banks to decompose the data into particular bands. Compression is achieved by quantizing the lower frequency bands more finely than the higher frequency bands while sampling the lower frequency bands more coarsely than the higher frequency bands. A summary of wavelet, DCT, and other transform coding is given in Chapter 5 “Compression Algorithms for Diffuse Data” in Roy Hoffman, Data Compression in Digital Systems, Chapman and Hall: New York, (1997).
In any technology and for any phenomenon represented by digital data, the data before a transformation is performed are referred to as being “in the real domain”. After a transformation is performed, the new data are often called “transform data” or “transform coefficients”, and referred to as being “in the transform domain”. The function used to take data from the real domain to the transform domain is called the “forward transform”. The mathematical inverse of the forward transform, which takes data from the transform domain to the real domain, is called the respective “inverse transform”.
In general, the forward transform will produce real-valued data, not necessarily integers. To achieve data compression, the transform coefficients are converted to integers by the process of quantization. Suppose that (λi) is a set of real-valued transform coefficients resulting from the forward transform of one unit of data. Note that one unit of data may be a one-dimensional or two-dimensional block of data samples or even the entire data. The “quantization values” (qi) are parameters to the encoding process. The “quantized transform coefficients” or “transform-coded data” are the sequence of values (ai) defined by the quantization function Q:
                              a          i                =                              Q            ⁡                          (                              λ                i                            )                                =                      ⌊                                                            λ                  i                                                  q                  i                                            +              0.5                        ⌋                                              (        1        )            where [x] means the greatest integer less than or equal to x. The resulting integers are then passed on for possible further encoding or compression before being stored or transmitted. To decode the data, the quantized coefficients are multiplied by the quantization values to give new “dequantized coefficients” (λi′) given byλi′=qiai.  (2)
The process of quantization followed by dequantization (also called inverse quantization) can thus be described as “rounding to the nearest multiple of qi”. The quantization values are chosen so that the loss of information in the quantization step is within some specified bound. For example, for audio or image data, one quantization level is usually the smallest change in data that can be perceived. It is quantization that allows transform coding to achieve good data compression ratios. A good choice of transform allows quantization values to be chosen which will significantly cut down the amount of data to be encoded. For example, the DCT is chosen for image compression because the frequency components which result produce almost independent responses from the human visual system. This means that the coefficients relating to those components to which the visual system is less sensitive, namely the high-frequency components, may be quantized using large quantization values without perceptible loss of image quality. Coefficients relating to components to which the visual system is more sensitive, namely the low-frequency components, are quantized using smaller quantization values.
The inverse transform also generally produces non-integer data. Usually the decoded data are required to be in integer form. For example, systems for the playback of audio data or the display of image data generally accept input in the form of integers. For this reason, a transform decoder generally includes a step that converts the non-integer data from the inverse transform to integer data, either by truncation or by rounding to the nearest integer. There is also often a limit on the range of the integer data output from the decoding process in order that the data may be stored in a given number of bits. For this reason the decoder also often includes a “clipping” stage that ensures that the output data are in an acceptable range. If the acceptable range is [a,b], then all values less than a are changed to a, and all values greater than b are changed to b.
These rounding and clipping processes are often considered an integral part of the decoder, and it is these which are the cause of inaccuracies in decoded data and in particular when decoded data are re-encoded. For example, the JPEG standard (Part 1) specifies that a source image sample is defined as an integer with precision P bits, with any value in the range 0 to 2**P−1. The decoder is expected to reconstruct the output from the inverse discrete cosine transform (IDCT) to the specified precision. For the baseline JPEG coding P is defined to be 8; for other DCT-based coding P can be 8 or 12. The MPEG-2 video standard states in Annex A (Discrete cosine transform) “The input to the forward transform and the output from the inverse transform is represented with 9 bits.”
For JPEG the compliance test data for the encoder source image test data and the decoder reference test data are 8 bit/sample integers. Even though rounding to integers is typical, some programming languages convert from floating point to integers by truncation. Implementations in software that accept this conversion to integers by truncation introduce larger errors into the real-domain integer output from the inverse transform.
The term “high-precision” is used herein to refer to numerical values which are stored to a precision more accurate than the precision used when storing the values as integers. Examples of high-precision numbers are floating-point or fixed-point representations of numbers.