Transform coding has been widely used in many practical image/video compression systems. The basic idea behind using a transformation is to make the task of compressing the image after transformation easier than direct coding in the spatial domain. The Discrete Cosine Transform (DCT) has been used as the transformation in most of the coding standards as JPEG, H261/H.263 and MPEG.
In recent years most of the research activities have shifted from the DCT to the wavelet transform, especially after Shapiro published his work on embedded zerotree wavelet (EZW) image coding, see J. M. Shapiro, “Embedded Image Coding using zerotrees of wavelet coefficients”, IEEE Trans. on Signal Processing, Vol. 41, No. 12, pp. 3445–3462, December 1993.
The paper, W. B. Pennebaker, J. L. Mitchell, JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, New York, 1993 describe the state of the art in. DCT-based coding.
In many applications it is desired to obtain an embedded bit-stream. Since an embedded bit-stream contains all lower rates embedded at the beginning of the bit-stream, the bits are ordered from the most important to the less important. Using an embedded code, the encoding simply stops when the target parameter as the bit count is met. In a similar manner, given the embedded bit-stream, the decoder can cease decoding at any point and can produce reconstructions corresponding to all lower-rate encoding.
In order to make the embedded bit-stream optimal, it is desired to transmit first the bits which are most significant for the visual perception of an image. This corresponds to letting the bit-stream have a good compression/quality ratio at low bit rates.
The DCT is orthonormal, which means that it preserves the energy. In other words, with respect to the root mean squared error (RMSE) (or peak signal-to-noise ratio—PSNR) an error in the transformed image of a certain magnitude will produce an error of the same magnitude in the original image.
This means that the coefficients with the largest magnitudes should be transmitted first because they have the largest content of information. This also means that the information can also be ranked according to its binary representation, and the most significant bits should be transmitted first.
After the DCT transformation, most of the energy of the image is concentrated in low frequency coefficients, and the rest of the coefficients have very low values. This means that there are very many zeroes in the most significant bit planes (MSB) of the coefficients. Until the first significant bit (FSB) of a certain coefficient is found, the probability of zero is very high. The task of efficient encoding therefore becomes the task of encoding these zeroes in an efficient way.
In the papers Z. Xiong, O. Guleryuz, M. T. Orchard, A DCT-based embedded image coder, IEEE Signal Processing Letters, Vol. 3, No. 11, pp. 289–290, November 1996, N. K. Laurance, D. M. Monro, Embedded DCT coding with significance masking”, Proc.IEEE ICASSP 97 , Vol. IV, pp. 2717–2720, 1997 and J. Li, J. Li, C.-C. Jay Kuo, Layered DCT still image compression, IEEE Trans. On Circuits and Systems for Video Technology, Vol. 7, No. 2, April 1997, pp. 440–442, although DCT is the transform used, the coding of the coefficients is not done in the way that JPEG coding is done. Instead, an embedded bit stream can be produced.