Because of the massive amounts of data inherent in digital video, the transmission of full-motion, high-definition digital video signals is a significant problem in the development of high-definition television. More particularly, each digital image frame is a still image formed from an array of pixels according to the display resolution of a particular system. As a result, the amounts of raw digital information included in high resolution video sequences are massive. In order to reduce the amount of data that must be sent, compression schemes are used to compress the data. Various video compression standards or processes have been established, including, MPEG-2, MPEG-4, and H.263.
Many applications are enabled where video is available at various resolutions and/or qualities in one stream. Methods to accomplish this are loosely referred to as scalability techniques. There are three axes on which one can deploy scalability. The first is scalability on the time axis, often referred to as temporal scalability. Secondly, there is scalability on the quality axis, often referred to as signal-to-noise scalability or fine-grain scalability. The third axis is the resolution axis (number of pixels in image) often referred to as spatial scalability or layered coding. In layered coding, the bitstream is divided into two or more bitstreams, or layers. Each layer can be combined to form a single high quality signal. For example, the base layer may provide a lower quality video signal, while the enhancement layer provides additional information that can enhance the base layer image.
In particular, spatial scalability can provide compatibility between different video standards or decoder capabilities. With spatial scalability, the base layer video may have a lower resolution than the input video sequence, in which case the enhancement layer carries information which can restore the resolution of the base layer to the input sequence level.
Most video compression standards support spatial scalability. FIG. 1 illustrates a block diagram of an encoder 100 which supports MPEG-2/MPEG-4 spatial scalability. The encoder 100 comprises a base encoder 112 and an enhancement encoder 114. The base encoder is comprised of a low pass filter and downsampler 120, a motion estimator 122, a motion compensator 124, an orthogonal transform (e.g., Discrete Cosine Transform (DCT)) circuit 130, a quantizer 132, a variable length coder 134, a bitrate control circuit 135, an inverse quantizer 138, an inverse transform circuit 140, switches 128, 144, and an interpolate and upsample circuit 150. The enhancement encoder 114 comprises a motion estimator 154, a motion compensator 155, a selector 156, an orthogonal transform (e.g., Discrete Cosine Transform (DCT)) circuit 158, a quantizer 160, a variable length coder 162, a bitrate control circuit 164, an inverse quantizer 166, an inverse transform circuit 168, switches 170 and 172. The operations of the individual components are well known in the art and will not be described in detail.
Unfortunately, the coding efficiency of this layered coding scheme is not very good. Indeed, for a given picture quality, the bitrate of the base layer and the enhancement layer together for a sequence is greater than the bitrate of the same sequence coded at once.
FIG. 2 illustrates another known encoder 200 proposed by DemoGrafx (see U.S. Pat. No. 5,852,565). The encoder is comprised of substantially the same components as the encoder 100 and the operation of each is substantially the same so the individual components will not be described. In this configuration, the residue difference between the input block and the upsampled output from the upsampler 150 is inputted into a motion estimator 154. To guide/help the motion estimation of the enhancement encoder, the scaled motion vectors from the base layer are used in the motion estimator 154 as indicated by the dashed line in FIG. 2. However, this arrangement does not significantly overcome the problems of the arrangement illustrated in FIG. 1.
While spatial scalability, as illustrated in FIGS. 1 and 2, is supported by the video compression standards, spatial scalability is not often used due to a lack of coding efficiency. The lack of efficient coding means that, for a given picture quality, the bit rate of the base layer and the enhancement layer for a sequence together are more than the bit rate of the same sequence coded at once.