As the development of multimedia application grows, video data transmission techniques have been gradually changing from one-to-one (simulcast) communication to one-to-many (multicast) communication.
Due to the channel capacity variation and the disparate requirement for different receivers, it is necessary to develop video coding and transmission techniques that are efficient and scalable to the Internet heterogeneity. Although representing a video with multiple redundancy in different bit rates is a simple solution used to realize multicast in most of commercial systems, this approach is not efficient and cannot cope smoothly with the channel capacity variation. In contrast, video scalability is a better solution that generates a single bit stream for all intended recipients and the decoder of each recipient can reconstruct the video with different quality within a specific bit rate range. Depending on the specification of receivers, a scalable system can support scalability either in frame rate (“temporal scalability”), in frame resolution (“spatial scalability”), in frame quality (“SNR scalability”), or a hybrid of these (“hybrid scalability”). Despite the fact that many scalable coding methods have been developed in recent years, they are still considered less efficient, especially when they are used at low bitrate applications. Most existing systems use the hybrid motion-compensated DCT for video coding. Although the hybrid motion-compensation algorithm may not be the best solution for video scalability, the hybrid scheme is simple and has low delays in performing the frame prediction.
Vetterli and Kalker translated the motion compensation and DCT hybrid video coding into matching pursuits. See M. Vetterli and T. Kalker, “Matching pursuit for compression and application to motion compensated video coding”, Proc. ICIP, November 1994, pp. 725–729. They encode frames by the matching pursuit algorithm with a dictionary composed of motion blocks and DCT bases. Neff and Zakhor used matching pursuits to represent the motion residual image. See R. Neff and A. Zakhor, “Very low bit-rate video coding based on matching pursuits”, IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp. 158–171, February 1997.
According to their design, using matching pursuits in coding the residuals attains performances better than those of DCT in terms of PSNR and perceptual quality at low bit rates. It is indicated in R. Neff, T. Nomura and A. Zakhor's “Decoder Complexity and Performance Comparison of Matching Pursuit and DCT-Based MEPG-4 Video Codecs”, Proc. IEEE Int. Conf Image Processing., pp. 783–787, 1998, that a post processing in removing blocky and ringing artifacts is required at the decoder to achieve a reasonable quality at low bit rates, if DCT is used in encoding motion residuals. Note that post processing is not required at a decoder for the same quality, if the residuals are encoded by matching pursuits. Thus, coding motion residuals with matching pursuits yields less decoder complexity.
Certain SNR-scalable schemes based on matching-pursuits have been proposed. Al-Shaykh et al. disclosed a fine grained scalability (FGS) coding algorithm which can produce a continuous bit stream with increasing SNR. See O. Al-Shaykh, E. Miloslavsky, T. Nomura, R. Neff, and A. Zakhor, “Video compression using matching pursuits”, IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 1, pp. 123–143, February 1999. However, this codec must encode at least 5 atoms as a unit at a time in order to attain a better coding efficiency of atom positions. Thus, a better coding efficiency of atom positions is obtained at the expense of sacrificing the FGS property. The scheme proposed by Vleeschouver et al. used two residuals as coarse scalability. See C. D. Vleeschouwer and B. Macq, “SNR scalability based on matching pursuits”, IEEE Trans. on Multimedia, vol. 2, No. 4, 2000, pp. 198–208.