Scalable video coding has been used for compressing video transmitted over computer networks having a varying bandwidth, such as the Internet. One such video coding scheme is fine granular scalable (embedded) video coding. Fine granular scalable (FGS) video coding has been adopted by the ISO MPEG-4 standard as the core video coding method for the MPEG-4 Streaming Video Profile.
As shown in FIG. 1, the FGS video coding scheme includes a motion-compensated base layer 10, encoded with a non-scalable codec to include I, P, and B frames at the bit-rate RBL, and an enhancement layer 11 encoded with a scalable codec, such as FGS to include quality signal-to-noise-ratio (SNR) I, P, and B residual frames at a maximum bit-rate Rmax. At transmission time, a portion of the enhancement layer 11 corresponding to the bit-rate REL is “cut” from the FGS encoded bitstream, such that the available bandwidth R=RBL+REL.
If, for example, the enhancement layer stream needs to accommodate clients with bit-rates ranging between 100 kbit/s and 1 Mbit/s, RBL will be set to 100 kbit/s and Rmax will be set to 1 Mbit/s. Hence, the base layer I, P, and B frames coded at 100 kbit/s, will always be transmitted. However, if more bandwidth is available, at least a portion of the enhancement layer residual I, P, and B frames will also be transmitted.
FIGS. 2A and 2B show exemplary FGS hybrid temporal-SNR video coding schemes as described in the earlier mentioned commonly assigned, copending U.S. patent application Ser. No. 09/590,825. In the video coding scheme of FIG. 2A, a base layer 12 is encoded with a non-scalable codec to include I, P, and B frames, and an enhancement layer 13 is encoded with a scalable codec, such as FGS, to include residual B frames, i.e., temporal frames (FGST frames when FGS encoding is used) and quality signal-to-noise ratio residual I and P frames, i.e, quality signal-to-noise-ratio (SNR) frames (FGS frames when FGS encoding is used). In the video coding scheme of FIG. 2B, a base layer 14 is encoded with a non-scalable codec to include only I and P frames, i.e., no B-frames are coded in the base layer 14. An enhancement layer 15 is encoded similar to the coding scheme of FIG. 2A with a scalable codec, such as FGS, to include residual B frames, i.e., FGST frames and SNR residual I and P frames, i.e., FGS frames.
During transmission by a server, the base layer I, P, and B frames (coding scheme of FIG. 2A) or the base layer I and P frames (coding scheme of FIG. 2B) are coded up to RBL, and are always transmitted. If more bandwidth is available, the current approach for transmitting the enhancement layers 13, 15 in the coding schemes of FIGS. 2A and 2B is to allocate the enhancement-layer bits (REL) equally between the FGS and FGST frames using a rate-control strategy.
One disadvantage associated with FGS codecs is their reduced coding efficiency compared with the non-scalable REL. This efficiency penalty is due to insufficient temporal decorrelation achieved with FGS (the temporal correlations are exploited only in the base-layer), and not due to the embedded coding of the enhancement layers.
Temporal correlation in the enhancement layer can be reduced by improving the temporal decorrelation in the base layer. This can be accomplished by allocating a greater number of bits to the base layer. However, this is not possible since RBL corresponds to the minimum bandwidth guaranteed to be available to clients at all times. If BL is increased, some of these clients will not be able to decode the base layer at certain times, which is contrary to the MPEG-4 standard.