Applications of multimedia are more and more popular in today's world. For instance, one can listen to a CD player or access a web page via the Internet. One of the common problems in multimedia applications via the Internet is that the data of uncompressed video is too large for storage and transmission. Several coding standards have been defined by ITU-T and ISO-IEC MPEG committees to address data compression issues. With the establishment of these standards, it is much easier to store and transmit video data.
Because the Internet technology has advanced greatly over the past few years, one can read a web page, play games, and download files over the Internet nowadays. Streaming video is an important web application. People can access pre-encoded video clips from a video server via the network. The greatest advantage of streaming video is people can subscribe the video data through the Internet connection from anywhere. In streaming video, users may access videos from heterogeneous networks such as ADSL, cable modem, etc. Due to the bandwidth variations, the streaming video provider must transmit the bitstream at variable bit-rates.
There are some traditional methods for bit-rate adaptation. One is to encode multiple bitstreams at the encoding time. However, in video multicast environment, hundreds or thousands of clients may access the data at the same time. The total bit rate required is the sum of the bit rates of these multiple bitstreams. Another is to encode the bitstream at a highest bit-rate of the Internet and then transcode the bitstream into different bit-rates. First, the transcoder decodes the encoded bitstream, and then re-encodes it to meet the bit-rate that is suitable for each client. In this way, the streaming video provider can use a transcoder to transcode the bitstream into different bit-rates for different users.
A new concept called Fine Granularity Scalability (FGS) was proposed and standardized in MPEG-4 Draft Amendment 4. FGS contains one base layer and one enhancement layer. The FGS base layer is generated using an MPEG-4 coder at the lowest bit rate of all possible connections. FGS takes the original and reconstructed discrete cosine transform (DCT) coefficients to generate the enhancement layer bitstream using bit-plane coding. The reconstructed DCT coefficients are subtracted from the original ones to generate the residues introduced by the quantization process. Then the FGS codec uses bit-plane coding to encode these residues and outputs these bit-planes from the most significant bit (MSB) to the least significant bit (LSB). The enhancement layer can be truncated at any amount of bits. If the client has extra bandwidth after receiving the FGS base layer, it can also receive the enhancement layer. The more the FGS enhancement bit-planes are received, the better the reconstructed quality is. FGS provides a bit-rate range from the base-layer bit-rate to the upper bound of the client bandwidth. Therefore FGS is very suitable for streaming video with multicasting. As shown in FIG. 1, all clients (client 1, 2, 3) can receive the FGS base layer at minimum perceptual quality. Because of insufficient bandwidth, client 1 can not receive the FGS enhancement layer. But client 2 and client 3 can receive the FGS bit-planes as many as they can.
Because FGS can support a wide range of bit-rates to adapt to bandwidth variations, it is much more flexible than other coding schemes for streaming video applications. Therefore FGS becomes more and more popular in streaming video applications. While providing such a high flexibility for bandwidth adaptation, the coding efficiency of an FGS coder is not as good as that of a non-scalable coder at the same bit-rate. The inefficient coding performance mainly results from two factors. First, only coarse predictions are used for the motion-compensated predictive coding of the FGS base-layer, while the coding residuals (the image details) reconstructed from the enhancement-layer are not used for prediction. Second, there is no motion-compensated prediction loop involved in the FGS enhancement-layer coder. That is, each FGS enhancement-layer frame is intra-layer coded. Since the FGS base-layer is encoded at the lowest bit-rate with the minimal human perceptual visual quality, the coding gain in the temporal prediction of the FGS base layer is usually not as good as that for a non-scalable coder.
FIG. 2 shows the encoding process to produce the FGS base-layer and enhancement-layer bitstreams. The base layer is encoded using an MPEG-4 non-scalable coder at bit-rate Rb. The FGS enhancement-layer coder uses the original and the de-quntizeded DCT coefficients as its inputs and generates the FGS enhancement-layer bitstream using bit-plane coding. The encoding procedure of the FGS enhancement-layer bitstream goes as follows. First, the de-quantized DCT coefficients are subtracted from the original DCT coefficients to obtain the quantization residues. After generating all DCT residues of a frame, the enhancement-layer coder finds the maximum absolute value of these DCT residues to determine the maximum number of bit-planes for this frame. After defining the maximum number of bit-planes in a frame, the FGS enhancement-layer coder will output the enhancement data bit-plane by bit-plane started from the most significant bit-plane (MSB plane) to the least significant bit-plane (LSB plane). The binary bits in each bit-plane are converted into symbols, and variable length encoded to generate the output bitstream. The following example illustrates the procedure, where the absolute quantization residues of a DCT block are given as follows:                5, 0, 4, 1, 2, 0, . . . 0, 0        
The maximum value in this block is 5 and the number of bits to represent 5 in a binary format (101) is 3. Writing every value in binary format, the 3 bit-planes are formed:                1, 0, 1, 0, 0, 0 . . . 0, 0 (MSB)        0, 0, 0, 0, 1, 0 . . . 0, 0 (MSB-1)        1, 0, 0, 1, 0, 0 . . . 0, 0 (LSB)        
FIG. 3 illustrates the FGS decoding process for the enhancement-layer frame reconstruction. The process of decoding the FGS base layer is the same as that of decoding an MPEG-4 non-scalable bitstream. Due to the embedded characteristics of FGS streams, the decoder receives and variable-length decodes the bit-planes of DCT residues from the MSB bit-plane to the LSB bit-plane. Because the decoder may not receive all blocks of some specific bit-plane, the decoder fills 0's into the non-received blocks of bit-planes and performs IDCT to convert the received DCT coefficients into the pixel values. These pixel values are subsequently added to the base-layer decoded frame to obtain the final enhanced video image.
Although FGS can support a wide range of bit-rates to ease the adaptation of channel variations, it, however, presents some disadvantages. Referring to FIG. 2, the input signal fed into the enhancement-layer coder is the quantization error of the prediction residue of the incoming video with reference to its base-layer reconstructed version, which is encoded at the lowest bit-rate with the minimum visual quality. In this way, the base-layer video is usually not able to approximate the incoming video with high accuracy, so the quantization error is relatively large, thereby leading to low coding efficiency. The performance of single-layer coding is better than the FGS coding at the same transmission bit-rate because the single-layer coding uses the full-quality video for prediction. The performance degradation can be up to 1.5 to 2.5 dB as reported in the prior arts.
To overcome this problem, there have been several relevant works proposed for enhancing the visual quality of FGS coding as will be briefly described below.
A method to improving the FGS coding efficiency, referred to as “Adaptive Motion Compensated FGS” (AMC-FGS) has been proposed. The AMC-FGS codec is featured with two simplified scalable codecs: one-loop and two-loop MC-FGS with different degrees of coding efficiency and error resilience. The two-loop MC-FGS employs an additional MCP loop at the enhancement-layer coder for only B-frames to obtain better coding efficiency. Since B-frames are not referenced by other frames for prediction during encoding and decoding, there will be no error propagation due to the loss of B-frame data. If drifting errors occur in one B-frame, the drifting errors will not propagate to the following frames. The one-loop MC-FGS introduces fine predictions for P- and B-frames, leading to relatively higher coding efficiency compared to the two-loop MC-FGS. However, the error robustness would become significantly lower since the drifting error can be rather significant if the enhancement-layer data used for prediction of the base layer of P-frames cannot be received at the decoder due to packet losses caused by insufficient channel bandwidth or channel error, leading to significant quality degradation. An adaptive decision algorithm is used in AMC-FGS to dynamically switch over the two prediction schemes to achieve better tradeoff in terms of coding efficiency and error robustness.
A new FGS structure which is called “Progressive FGS (PFGS)” has also been proposed. In the proposed structure, the enhancement layer not only can refer to the FGS base layer but also can refer to the previous enhancement-layer data. However, the same drifting errors also confuse the output quality if referenced bit-planes can not be guaranteed to transmit to the decoder when the bandwidth is dropped.
Another method that has been proposed is referred to as “Robust Fine Granularity Scalability (RFGS)”. The method focuses on the tradeoff between coding efficiency and robustness by adopting additional motion compensation (MC) loop at the enhancement layer and including leaking prediction. The extra MC loop can improve the coding efficiency by referencing high quality frame memory, and the accompanied drift errors are handled by leaking prediction. A leaky factor a (0≦α≦1), which is bound with the estimated drift errors, is introduced into the reconstructed frame memory at the enhancement layer. And, a separated factor introduced is the number of referenced bit-planes β (0≦β≦1 maximal number of bit-planes) which is utilized in partial prediction. By adjusting both factors, the RFGS can provide flexibility of various encoding schemes. If the leaky factor (α) is set to zero, it is almost the same as the original FGS. If the factor (α) is set to unity for all referencing frames, the prediction modes of RFGS and MC-FGS are equal.