It is desirable that a scalable video coding scheme can support different applications and decoder requirements without incurring the bitrate penalty of simulcast encoding. For many existing scalable coding schemes, where motion estimation and compensation is applied to remove the temporal redundancy, the prior art approaches used to code motion vectors have deficiencies. In general, two methods have been used to code the motion vectors: non-scalable motion vector coding and scalable motion vector coding. In the non-scalable motion vector coding scheme, the motion vector is coded with the precision of the highest enhancement layer and is stored in the base layer. Then, for the lower layer, the motion vectors need to be downsampled. In the scalable motion vector coding scheme, a refinement of a lower layer motion vector is coded at each higher layer such that the coded motion vector represents the motion vector precision for that higher layer. For total bitrate coding efficiency, non-scalable motion vector coding has better efficiency than scalable motion vector coding. However, non-scalable motion vector coding puts all the bits for motion vectors in the base layer. Thus, if bitrate scalability is required, non-scalable motion vector coding can hurt base layer quality. Since non-scalable motion vector coding requires downsampling for lower layers, it can cause the problem of the base layer not being standard compliant. Some of the decoders may not be able to support such a feature. On the other hand, if bitrate scalability is not a requirement, but instead complexity and total bitrate coding efficiency, such as in broadcast video applications, non-scalable motion vector coding can have better coding efficiency.
Many different methods of scalability have been widely studied and standardized, including SNR scalability, spatial scalability, temporal scalability, and fine grain scalability, in scalability profiles of the MPEG-2 and MPEG-4 standards. Most of the work in scalable coding has been aimed at bitrate scalability, where the low resolution layer has a limited bandwidth. As shown in FIG. 1, a typical spatial scalability system is indicated generally by the reference numeral 100. The system 100 includes a complexity scalable video encoder 110 for receiving a video sequence. A first output of the complexity scalable video encoder 110 is connected in signal communication with a low bandwidth network 120 and with a first input of a multiplexer 130. A second output of the complexity scalable video encoder 110 is connected in signal communication with a second input of the multiplexer 130. An output of the low bandwidth network 120 is connected in signal communication with an input of a low resolution decoder 140. An output of the multiplexer 130 is connected in signal communication with an input of a high bandwidth network 150. An output of the high bandwidth network 150 is connected in signal communication with an input of a demultiplexer 160. A first output of the demultiplexer 160 is connected in signal communication with a first input of a high resolution decoder 170, and a second output of the demultiplexer 160 is connected in signal communication with a second input of the high resolution decoder 170. An output of the low-resolution decoder 140 is available as an output of the system 100 for a base layer bitstream, and an output of the high-resolution decoder 170 is available as an output of the system 100 for a scalable bitstream.
Scalable coding has not been widely adopted in practice, because of the considerable increase in encoder and decoder complexity, and because the coding efficiency of scalable encoders is typically well below that of non-scalable encoders.
Spatially scalable encoders and decoders typically require that the high resolution scalable encoder/decoder provide additional functionality than would be present in a normal high resolution encoder/decoder. In an MPEG-2 spatial scalable encoder, a decision is made whether prediction is performed from a low resolution reference picture or from a high resolution reference picture. An MPEG-2 spatial scalable decoder must be capable of predicting either from the low resolution reference picture or the high resolution reference picture. Two sets of reference picture stores are required by an MPEG-2 spatial scalable encoder/decoder, one for low resolution pictures and another for high resolution pictures. FIG. 2 shows a block diagram for a low-complexity spatial scalable encoder 200 supporting two layers, according to the prior art. FIG. 3 shows a block diagram for a low-complexity spatial scalable decoder 300 supporting two layers, according to the prior art.
Turning to FIG. 2, a spatial scalable video encoder supporting two layers is indicated generally by the reference numeral 200. The video encoder 200 includes a downsampler 210 for receiving a high-resolution input video sequence. The downsampler 210 is coupled in signal communication with a low-resolution non-scalable encoder 212, which, in turn, is coupled in signal communication with low-resolution frame stores 214. The low-resolution non-scalable encoder 212 outputs a low-resolution bitstream, and is further coupled in signal communication with a low-resolution non-scalable decoder 220.
The low-resolution non-scalable decoder 220 is coupled in signal communication with an upsampler 230, which, in turn, is coupled in signal communication with a scalable high-resolution encoder 240. The scalable high-resolution encoder 240 also receives the high-resolution input video sequence, is coupled in signal communication with high-resolution frame stores 250, and outputs a high-resolution scalable bitstream. An output of the low-resolution non-scalable encoder 212 and an output of the scalable high-resolution encoder are available as outputs of the spatial scalable video encoder 200.
Thus, a high resolution input video sequence is received by the low-complexity encoder 200 and down-sampled to create a low-resolution video sequence. The low-resolution video sequence is encoded using a non-scalable low-resolution video compression encoder, creating a low-resolution bitstream. The low-resolution bitstream is decoded using a non-scalable low-resolution video compression decoder. This function may be performed inside of the encoder. The decoded low-resolution sequence is up-sampled, and provided as one of two inputs to a scalable high-resolution encoder. The scalable high-resolution encoder encodes the video to create a high-resolution scalable bitstream.
Turning to FIG. 3, a spatial scalable video decoder supporting two layers is indicated generally by the reference numeral 300. The video decoder 300 includes a low-resolution decoder 360 for receiving a low-resolution bitstream, which is coupled in signal communication with low-resolution frame stores 362, and outputs a low-resolution video sequence. The low-resolution decoder 360 is further coupled in signal communication with an upsampler 370, which, in turn, is coupled in signal communication with a scalable high-resolution decoder 380.
The scalable high-resolution decoder 380 is further coupled in signal communication with high-resolution frame stores 390. The scalable high-resolution decoder 380 receives a high-resolution scalable bitstream and outputs a high-resolution video sequence. An output of the low-resolution decoder 360 and an output of the scalable high-resolution decoder are available as outputs of the spatial scalable video decoder 300.
Thus, both a high-resolution scalable bitstream and low-resolution bitstream are received by the low-complexity decoder 300. The low-resolution bitstream is decoded using a non-scalable low-resolution video compression decoder, which utilizes low-resolution frame stores. The decoded low-resolution video is up-sampled, and then input into a high-resolution scalable decoder. The high-resolution scalable decoder utilizes a set of high-resolution frame stores, and creates the high-resolution output video sequence.
Turning to FIG. 4, a non-scalable video encoder is indicated generally by the reference numeral 400. An input to the video encoder 400 is connected in signal communication with a non-inverting input of a summing junction 410. The output of the summing junction 410 is connected in signal communication with a transformer/quantizer 420. The output of the transformer/quantizer 420 is connected in signal communication with an entropy coder 440. An output of the entropy coder 440 is available as an output of the encoder 400.
The output of the transformer/quantizer 420 is further connected in signal communication with an inverse transformer/quantizer 450. An output of the inverse transformer/quantizer 450 is connected in signal communication with an input of a deblock filter 460. An output of the deblock filter 460 is connected in signal communication with reference picture stores 470. A first output of the reference picture stores 470 is connected in signal communication with a first input of a motion estimator 480. The input to the encoder 400 is further connected in signal communication with a second input of the motion estimator 480. The output of the motion estimator 480 is connected in signal communication with a first input of a motion compensator 490. A second output of the reference picture stores 470 is connected in signal communication with a second input of the motion compensator 490. The output of the motion compensator 490 is connected in signal communication with an inverting input of the summing junction 410.
Turning to FIG. 5, a non-scalable video decoder is indicated generally by the reference numeral 500. The video decoder 500 includes an entropy decoder 510 for receiving a video sequence. A first output of the entropy decoder 510 is connected in signal communication with an input of an inverse quantizer/transformer 520. An output of the inverse quantizer/transformer 520 is connected in signal communication with a first input of a summing junction 540.
The output of the summing junction 540 is connected in signal communication with a deblock filter 590. An output of the deblock filter 590 is connected in signal communication with reference picture stores 550. The reference picture stores 550 is connected in signal communication with a first input of a motion compensator 560. An output of the motion compensator 560 is connected in signal communication with a second input of the summing junction 540. A second output of the entropy decoder 510 is connected in signal communication with a second input of the motion compensator 560. The output of the deblock filter 590 is available as an output of the video decoder 500.
It has been proposed that H.264/MPEG AVC be extended to use a Reduced Resolution Update (RRU) mode. The RRU mode improves coding efficiency at low bitrates by reducing the number of residual macroblocks (MBs) to be coded, while performing motion estimation and compensation of full resolution pictures. Turning to FIG. 6, a Reduced Resolution Update (RRU) video encoder is indicated generally by the reference numeral 600. An input to the video encoder 600 is connected in signal communication with a non-inverting input of a summing junction 610. The output of the summing junction 610 is connected in signal communication with an input of a downsampler 612. An input of a transformer/quantizer 620 is connected in signal communication with an output of the downsampler 612 or with the output of the summing junction 610. An output of the transformer/quantizer 620 is connected in signal communication with an entropy coder 640. An output of the entropy coder 640 is available as an output of the video encoder 600.
The output of the transformer/quantizer 620 is further connected in signal communication with an input of an inverse transformer/quantizer 650. An output of the inverse transformer/quantizer 650 is connected in signal communication with an input of an upsampler 655. An input of a deblock filter 660 is connected in signal communication with an output of the inverse transformer/quantizer 650 or with an output of the upsampler 655. An output of the deblock filter 660 is connected in signal communication with an input of reference picture stores 670. A first output of the reference picture stores 670 is connected in signal communication with a first input of a motion estimator 680. The input to the encoder 600 is further connected in signal communication with a second input of the motion estimator 680. The output of the motion estimator 680 is connected in signal communication with a first input of a motion compensator 690. A second output of the reference picture stores 670 is connected in signal communication with a second input of the motion compensator 690. The output of the motion compensator 690 is connected in signal communication with an inverting input of the summing junction 610.
Turning to FIG. 7, a Reduced Resolution Update (RRU) video decoder is indicated generally by the reference numeral 700. The video decoder 700 includes an entropy decoder 710 for receiving a video sequence. An output of the entropy decoder 710 is connected in signal communication with an input of an inverse quantizer/transformer 720. An output of the inverse quantizer/transformer 720 is connected in signal communication with an input of an upsampler 722. An output of the upsampler 722 is connected in signal communication with a first input of a summing junction 740.
An output of the summing junction 740 is connected in signal communication with a deblock filter 790. An output of the deblock filter 790 is connected in signal communication with an input of full resolution reference picture stores 750. The output of the deblock filter 790 is also available as an output of the video decoder 700. An output of the full resolution reference picture stores 750 is connected in signal communication with a motion compensator 760, which is connected in signal communication with a second input of the summing junction 740.
It has been proposed to use RRU concept to design a complexity scalable codec. An example is provided for a system that supports two different levels of decoder complexity and resolution. A low resolution decoder has a smaller display size and has very strict decoder complexity constraints. A full resolution decoder has a larger display size and less strict but still important decoder complexity constraints. A broadcast or multicast system transmits two bitstreams, a base layer with bitrate BRbase and an enhancement layer with bitrate BRenhan. The two bitstreams may be multiplexed together and sent in a single transport stream. Turning to FIG. 8, a complexity scalability broadcast system is indicated generally by the reference numeral 800. The system 800 includes a complexity scalable video encoder and a low resolution decoder and a full resolution decoder. The complexity scalability broadcast system 800 includes a complexity scalable video encoder 810. A first output of the complexity scalable video encoder 810 is connected in signal communication with a first input of a multiplexer 820. A second output of the complexity scalable video encoder 810 is connected in signal communication with a second input of the multiplexer 820. An output of the multiplexer 820 is connected in signal communication with a network 830. An output of the network 830 is connected in signal communication with an input of a first demultiplexer 840 and with an input of a second demultiplexer 850. An output of the first demultiplexer 840 is connected in signal communication with an input of a low resolution decoder 850. A first output of the second demultiplexer 860 is connected in signal communication with a first input of a full resolution decoder 870. A second output of the second demultiplexer 860 is connected in signal communication with a second input of the full resolution decoder 870. An output of the low-resolution decoder 850 is available as an output of the system 800 for a base layer bitstream, and an output of the full-resolution decoder 870 is available as an output of the system 800 for a scalable bitstream.
The low-resolution decoder 850 processes only the base layer bitstream and the full resolution decoder 870 processes both the base layer bitstream and the enhancement layer bitstream. RRU is used in the base layer, which can be decoded into both low resolution and high resolution sequences with different complexity at the decoder. The enhancement layer bitstream includes a full resolution error signal, to be added to the result of decoding the base layer bitstream, which was done with full resolution motion compensation. The bitrate of the enhancement layer may end up being lower than that of the base layer, which differs from the typical spatial scalability case where the base layer bitrate is typically small compared with the enhancement layer bitrate. A full resolution error signal is not necessarily sent for every coded macroblock or slice/picture.