The present invention generally relates to video coding, and more particularly to a fine granular coding technique that includes both quality and temporal scalability.
Fine-granular-scalability (FGS) has been used to compress video for transmission over networks that have a varying bandwidth such as the Internet. Examples of such FGS structures are shown in FIGS. 1A-1B and 2A-2B. As can be seen, these structures consist of a base-layer coded at a bit-rate RBL and a single fine-granular enhancement layer coded at REL However, in FIGS. 1A-1B, the base layer has been encoded to include just I and P frames, while in FIGS. 2A-2B the base layer has been encoded to include I, P and B frames.
Due to the fine granularity of the enhancement layer, a FGS video stream can be transmitted over any network session with an available bandwidth ranging from Bmin=RBL to Bmax=RBL+REL. For example, if the available bandwidth between the transmitter and the receiver is B=R, then the transmitter sends the base-layer at the rate RBL and only a portion of the enhancement layer at the rate Re=Rxe2x88x92RBL. As can be seen from FIGS. 1B and 2B, portions of the enhancement layer can be selected in a fin granular manner for transmission. Therefore, the total transmitted bit-rate is R=RBL+Re.
Due to its flexibility in supporting a wide range of transmission bandwidth with a single enhancement layer, the FGS framework has been adopted by the ISO MPEG-4 standard. An example a system utilizing a FGS-based encoder is shown in FIG. 3. The system includes a network 6 with a variable available bandwidth in the range of (Bmin=Rmin, Bmax=Rmax). A calculation block 4 is also included for estimating or measuring the current available bandwidth (R). A base layer (BL) video encoder 8 compresses the signal from the video source 2 using a bit-rate (RBL) in the range (Rmin, R). Typically, the base layer encoder 8 compresses the signal using the minimum bit-rate (Rmin). This is especially the case when the BL encoding takes place off-line prior to the time of transmitting the video signal. As can be seen, a unit 10 is also included for computing the residual images 12. Further, an enhancement layer (EL) encoder 14 compresses the residual signal with a bit-rate REL, which can be in the range of RBL to Rmaxxe2x88x92RBL. It is important to note that the encoding of the video signal (both enhancement and base layers) can take place either in real-time (as implied by the figure) or off-line prior to the time of transmission. In the latter case, the video can be stored and then transmitted (or streamed) at a later time using a real-time rate controller 16, as shown. The real time controller 16 selects the best quality enhancement layer signal taking into consideration the current (real-time) available bandwidth R. Therefore, the output bit-rate of the EL signal from the rate controller equals, Rxe2x88x92RBL.
The present invention is directed to fine granular scalability coding technique that includes both quality and temporal scalability. In one example of coding the video data according to the present invention, a portion of the video data is coded to produce base layer frames. Motion compensated residual images are produced from the video data and the base layer frames. The motion compensated residual images are coded using a fine granular coding technique to produce temporal enhancement frames. Further, residual images are generated from the video data and the base layer frames. The residual images are then coded also using a fine granular coding technique to produce quality enhancement frames. The temporal enhancement frames and the quality enhancement frames-also can be combined into an enhancement layer.
In another example of coding video data according to the present invention, a portion of the video data is coded to produce base layer frames. Motion compensated residual images are generated from the video data and the base layer frames. The motion compensated residual images are coded to produce temporal enhancement frames. Residual images are generated from the video data, the base layer frames and the temporal enhancement frames. The residual images are then coded using a fine granular coding technique to produce quality enhancement frames. Further, the temporal enhancement frames form a temporal enhancement layer and the quality enhancement frames form a quality enhancement layer.
In one example of decoding a video signal including a base layer and an enhancement layer according to the present invention, the base layer is decoded to produce video frames. The enhancement layer is also decoded to produce motion vectors. Motion compensation is then performed on the video frames according to the motion vectors to produce additional video frames. The video frames and the additional video frames are the combined into a video sequence. Further, the enhancement layer is decoded to produce enhanced video frames. Each of the enhanced video frames is added to one of the video frames and additional video frames.