Scalable enhancement layer video coding has been used for compressing video transmitted over computer networks having a varying bandwidth, such as the Internet. A current enhancement layer video coding scheme employing fine granular scalable coding techniques (adopted by the ISO MPEG-4 standard) is shown in FIG. 1. As can be seen, the video coding scheme 10 includes a prediction-based base layer 11 coded at a bit rate RBL, and an FGS enhancement layer 12 coded at REL.
The prediction-based base layer 11 includes intraframe coded I frames, interframe coded P frames which are temporally predicted from previous I- or P-frames using motion estimation-compensation, and interframe coded bi-directional B-frames which are temporally predicted from both previous and succeeding frames adjacent the B-frame using motion estimation-compensation. The use of predictive and/or interpolative coding i.e., motion estimation and corresponding compensation, in the base layer 11 reduces temporal redundancy therein, but only to a limited extent, since only base layer frames are used for prediction.
The enhancement layer 12 includes FGS enhancement layer I-, P-, and B-frames derived by subtracting their respective reconstructed base layer frames from the respective original frames (this subtraction can also take place in the motion-compensated domain). Consequently, the FGS enhancement layer I-, P- and B-frames in the enhancement layer are not motion-compensated. (The FGS residual is taken from frames at the same time-instance.) The primary reason for this is to provide flexibility which allows truncation of each FGS enhancement layer frame individually depending on the available bandwidth at transmission time. More specifically, the fine granular scalable coding of the enhancement layer 12 permits an FGS video stream to be transmitted over any network session with an available bandwidth ranging from Rmin=RBL to Rmax=RBL+REL. For example, if the available bandwidth between the transmitter and the receiver is B=R, then the transmitter sends the base layer frames at the rate RBL and only a portion of the enhancement layer frames at the rate REL=R-RBL. As can be seen from FIG. 1, portions of the FGS enhancement layer frames in the enhancement layer can be selected in a fine granular scalable manner for transmission. Therefore, the total transmitted bit-rate is R=RBL+REL. Because of its flexibility in supporting a wide range of transmission bandwidth with a single enhancement layer.
FIG. 2 shows a block-diagram of a conventional FGS encoder for coding the base layer 11 and enhancement layer 12 of the video coding scheme of FIG. 1. As can be seen, the enhancement layer residual of frame i (FGSR(i)) equals MCR(i)-MCRQ(i), where MCR(i) is the motion-compensated residual of frame i, and MCRQ(i) is the motion-compensated residual of frame i after the quantization and the dequantization processes.
Although the current FGS enhancement layer video coding scheme 10 of FIG. 1 is very flexible, it has the disadvantage that its performance in terms of video image quality is relatively low compared with that of a non-scalable coder functioning at the same transmission bit-rate. The decrease in image quality is not due to the fine granular scalable coding of the enhancement layer 12 but mainly due to the reduced exploitation of the temporal redundancy among the FGS residual frames within the enhancement layer 12. In particular, the FGS enhancement layer frames of the enhancement layer 12 are derived only from the motion-compensated residual of their respective base layer I-, P-, and B-frames, no FGS enhancement layer frames are used to predict other FGS enhancement layer frames in the enhancement layer 12 or other frames in the base layer 11.
Accordingly, a scalable video coding scheme is needed that employs portions of the enhancement layer frames for motion compensation to improve image quality while preserving most of the flexibility and attractive characteristics typical to the current FGS video coding scheme.