In digital video applications, the availability of bandwidth is the overriding factor in the visual quality of the video. The less bandwidth that is available, the lower the quality of the video, while higher bandwidths allow for more spatial clarity and increased temporal resolution. To provide for an efficient means of transmitting and/or storing digital video data at varying quality levels, or equivalently at different encoded rates, video scalability is utilized.
Video scalability is a technique for encoding enrichment data in the form of enhancement layers that when combined with a lower rate base layer result in progressively higher quality video. It is a mechanism for providing varying levels of video quality using a single bitstream without having to re-encode the source for a particular bitrate. In doing so, it eliminates the need to store compressed video sequences at different quality levels. By using scalability, an efficient, single, encoded bitstream is capable of providing varying levels of quality as warranted by the user or connection speed.
Scalability works by adding enhancement layers to a lower rate base layer. As more and more enhancement layers are combined, the better the video quality becomes. Furthermore, because there is no need to re-encode the source for different rates and to store multiple versions of the same sequence, both computational resources and storage space are conserved. The enhancement in quality can be in the form of increased signal-to-noise ratio (SNR), temporal continuity, and/or spatial resolution. Scalability used to enhance the SNR quality of a frame is referred to as SNR scalability. Temporal scalability refers to scalability designed to increase the temporal resolution by increasing the encoded frame rate. Finally, spatial scalability is used to enhance the spatial resolution, or dimensions, of a frame.
International video coding standards such as MPEG-2 [ISO/IEC 13818-2 MPEG-2 Information Technology—Generic Coding of Moving Pictures and Associated Audio—Part 2: Video, 1995], MPEG-4 [ISO/IEC 14496-2 MPEG-4 Information Technology—Coding of Audio-Visual Objects: Visual (Draft International Standard) October 1997], and H.263 [ITU-T Recommendation H.263 Video Coding for Low Bitrate Communication, January 1998] all support one or more of the above forms of scalability. The two most recent standards, H.263 and MPEG-4, support all three forms of scalability as well as defining the syntax such that combinations of the three can be used. For example, in a three layer scaleable bitstream, two enhancement layers can be of different types of scalability, or two types of scalability can be merged into a single enhancement layer.
The general concept of SNR scalability is shown in FIG. 1 where enhancement layers added to the base layer provide a resulting frame with less distortions and artifacts. Techniques for SNR scalability can be based on the video coding standards [ITU-T Recommendation H.263 Video Coding for Low Bitrate Communication, January 1998], [D. Wilson and M. Ghanbari. Optimization of two-layer SNR Scalability for MPEG-2 Video. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Vol. 4, Pages 2637–2640. IEEE 1997], or may be outside of the standards [L. P. Kondi. Low Bitrate SNR Scalable Video Coding and Transmission. Ph.D. Thesis, Northwestern University, December 1999], [J. DeLameillieure. Rate-distortion Optimal Thresholding in SNR Scalability Based on 2D Dynamic Programming. Proc. SPIE Conf. On Visual Communications and Image Processing, Vol. 2952, pages 689–698. SPIE 1996]. Within the standards, SNR scalability is achieved by re-encoding the difference (error) image between the source and transmitted frames. This error is re-quantized and re-encoded in an enhancement layer. In MPEG-4 a second method referred to as Fine Granularity Scalability (FGS) can be used to generate SNR enhancement layers. A technique for SNR scalability that is beyond the scope of the standards has been presented in [L. P. Kondi. Low Bitrate SNR Scalable Video Coding and Transmission. Ph.D. Thesis, Northwestern University, December 1999] and is based on a hybrid form of both spectral selection and successive approximation introduced in progressive JPEG. Here SNR scalability is accomplished by partitioning the quantized data into three layers. SNR scalability in MPEG-2 has been considered in [D. Wilson and M. Ghanbari. Optimization of two-layer SNR Scalability for MPEG-2 Video. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Vol. 4, Pages 2637–2640. IEEE 1997] where a technique for optimization of SNR scalability at bitrates of 2 Mbps is presented. Within H.263 Lee et. al. in [ B. R. Lee, K. K. Park, and J. J. Hwang. H.263-based SNR Scalable Video Codec. IEEE Trans. Consumer Electronics, Vol. 43, pages 614–622, September 1997] formulate a two-layer SNR scalable video codec with the enhancement layer being quantized based on the human visual system (HVS). Finally, optimization techniques have also been presented for use with SNR scalability. In [J. DeLameillieure. Rate-distortion Optimal Thresholding in SNR Scalability Based on 2D Dynamic Programming. Proc. SPIE Conf. On Visual Communications and Image Processing, Vol. 2952, pages 689–698. SPIE 1996] DeLameillieure formulates SNR scalability based on an optimal thresholding of the DCT coefficients using 2-dimensional dynamic programming. While many techniques exist for achieving SNR scalability they are limited in that they only consider SNR scalability.
The second form of scalability is temporal scalability. Temporal scalability is used to increase the frame rate, or temporal resolution, of an encoded sequence. In video compression it if often necessary to drop source frame from being coded in order to meet the bandwidth requirements of the channel. This results in the decrease of the overall encoded frame rate, and the lowering of the output temporal resolution. This low encoded frame rate can become perceptibly displeasing especially in high motion sequences where it will appear as “jerky” motion similar to a “snap-shot” effect. In these cases, temporal scalability can be used to increase the frame rate by encoding those frames not previously encoded in the previous layer as shown in FIG. 2. Thus while the base layer may be encoded at low frame rate, the base layer combined with the enhancement layer(s) will result in a temporally smoother sequence.
Temporal scalability in MPEG-2 has been discussed in [H. Sun and W. Kwok. MPEG Video Coding with Temporal Scalability. International Communications Conference, Vol 2952, pages 1742–1746. IEEE, 1995]. Here, the base and enhancement layers are developed jointly such that the total overall bitrate is constant but the rate for the base and enhancement layers are variable. Other investigations on temporal scalability have been outside the scope of the video coding standards and can be found in [J. B. Lee and A. Eleftheriadis. Motion Adaptive Model-Assisted Compatible Coding with Spatio-temporal Scalability. Proc. SPIE Conf. On Visual Communications and Image Processing, Vol. 3024, pages 622–634. SPIE, 1997] and [B. Girod and U. Horn. A Scalable Codec for Internet Video Streaming. DSP'97, pages 221–224. DSP, 1997].
In light of the foregoing, there is an unmet need in the art for a technique that provides a mechanism for both spatial and temporal enhancements in digital video, when temporal scalability and SNR scalability are combined. There is a further need in the art for selecting the type of scalability and the degree to which that type will be used.