1. Field of the Invention
The present invention is directed to a scalable video coding system which codes video data using both frame-prediction and fine-granular scalable images. The invention has particular utility in connection with variable-bandwidth networks and computer systems that are able to accommodate different bit rates, and hence different quality images.
2. Description of the Related Art
Scalable video coding in general refers to coding techniques which are able to provide different levels, or amounts, of data per frame of video. Currently, such techniques are used by lead video coding standards, such as MPEG-2 and MPEG-4 (i.e., xe2x80x9cMotion Picture Experts Groupxe2x80x9d coding), in order to provide flexibility when outputting coded video data.
In the scalable coding techniques currently employed by MPEG-2 and MPEG-4, an encoder codes frames of video data and divides the coded frames into a base layer (xe2x80x9cBLxe2x80x9d) and an enhancement layer (xe2x80x9cELxe2x80x9d). Typically, the base layer comprises a minimum amount of data required to decode the coded video data. The enhancement layer, on the other hand, comprises additional information which enhances (e.g., improves the quality of) the base layer when it is decoded. In operation, the encoder transmits all frames from the base layer to a receiving device, which can be a personal computer or the like. However, the encoder only transmits frames from the enhancement layer in cases where the receiving device has sufficient processing power to handle those additional frames and/or the medium over which the frames are transmitted has sufficient bandwidth.
FIGS. 1 and 2 show xe2x80x9cscalability structuresxe2x80x9d which are currently used in MPEG-2 and MPEG-4 for the base layer and the enhancement layer. More specifically, FIG. 1 shows a scalability structure 1 which employs frame-prediction in base layer 2 to generate predicative (or xe2x80x9cPxe2x80x9d) frames from an intra (or xe2x80x9cIxe2x80x9d) frame or from a preceding P frame. As shown in the figure, frame-prediction is also used in the enhancement layer to generate P frames based on frames in the base layer. FIG. 2 shows another scalability structure 3 which is currently used in MPEG-2 and MPEG-4. In the scalability structure shown in FIG. 2, frame-prediction is again employed to determine P frames in the base layer. Unlike scalability structure 1, however, scalability structure 3 also uses frame-prediction in the enhancement layer to generate bi-directional (or xe2x80x9cBxe2x80x9d) frames which, in this case, are interpolated from preceding frames in the enhancement layer and contemporaneous frames in the base layer. In general, MPEG-2 and MPEG-4 encoders use frame prediction in the manner set forth above to increase data compression and thus increase coding efficiency.
Another well-known scalable video coding technique is called fine-granular scalability coding. Fine-granular scalability coding codes the same image (e.g., a frame of video) using progressively more data each time coding takes place. For example, as shown in FIG. 3, image 4 is initially encoded using data sufficient to produce image 5. Thereafter, additional data is coded which is sufficient to produce enhanced images 6, 7 and 8 in succession.
Fine-granular scalability coding has several advantages over the frame-prediction techniques described above. Specifically, because fine-granular scalability coding can provide a wider range of enhanced images than frame-prediction techniques, fine-granular scalability coding is generally preferred in environments, such as the Internet, which have a wide range of available bandwidth. For similar reasons, fine-granular scalability coding is also generally preferred when dealing with receiving devices that have varying processing capabilities and/or bandwidth. That is, because fine-granular scalability coding produces a wide range of enhanced images, it is possible to match the appropriate image relatively closely to an amount of available bandwidth. As a result, in theory, it is possible to obtain the most amount of data for an image for a given amount of available bandwidth. On the down-side, fine-granular scalability coding does not permit the use of frame-prediction. As a result, it requires more data than the frame-prediction techniques described above and, consequently, degrades coding efficiency.
Thus, there exists a need for a scalable video coding technique which incorporates the efficiency of frame-prediction coding and the accuracy of fine-granular scalability coding.
The present invention addresses the foregoing need by coding a portion (e.g., a base layer) of input video data using a frame-prediction coding technique and then coding another portion (e.g., residual images in an enhancement layer) of the video data using fine-granular scalability coding. By coding a base layer using a frame-prediction coding technique, the present invention reduces the amount of bits required to code the video data and thus maintains coding efficiency. By coding the residual images using fine-granular scalability coding, the present invention is able to provide a wide range of residual images, one or more of which can be selected for transmission based, e.g., on an available bandwidth of a receiving device.
Thus, according to one aspect, the present invention is a system (i.e., a method, an apparatus, and computer-executable process steps) for coding video data comprised of one or more frames. The system codes a portion (e.g., a base layer) of the video data using a frame-prediction coding technique, and then generates residual images based on the video data and the coded video data. Thereafter, the system codes the residual images using a fine-granular scalability coding technique, and outputs the coded video data and at least one of the coded residual images to a receiver, such as a variable-bandwidth network or a networked device thereon.
In preferred embodiments of the invention, the system determines a bandwidth of the receiver, and then selects which of the coded residual images to output based on the bandwidth of the receiver. By doing this, the invention is able to output a coded residual image which is most appropriate for the available bandwidth.
In other preferred embodiments, the system codes the portion of the video data at a plurality of different bit rates so as to produce multiple versions of the coded video data, and generates a plurality of residual images for each version of the coded video data. In these embodiments, the system codes the residual images using a fine-granular scalability coding technique, determines variations in a bandwidth of the receiver over time, and then selects which one of the multiple versions and the coded residual images to output based on the variations in the bandwidth of the receiver.
By way of example, for a receiver bandwidth increasing from B1 to B2, where B1 less than B2, the system selects a first version of the coded video data and successively selects coded residual images corresponding to each frame of the first version of the coded video data, which are coded at successively higher bit rates. For a receiver bandwidth increasing from B2 to B3, where B2 less than B3, the system selects a second version of the coded video data and successively selects coded residual images corresponding to each frame of the second version of the coded video data, which are coded at successively higher bit rates. Conversely, for a receiver bandwidth decreasing from B3 to B2, where B3 greater than B2, the system selects a first version of the coded video data and successively selects coded residual images corresponding to each frame of the first version of the coded video data, which are coded at successively lower bit rates. Likewise, for a receiver bandwidth decreasing from B2 to B1, where B2 greater than B1, the system selects a second version of the coded video data and successively selects coded residual images corresponding to each frame of the second version of the coded video data, which are coded at successively lower bit rates.
As is clear from the foregoing, by coding a base layer at a plurality of different bit rates and then selecting versions of the base layer and the residual images based on a range of available bandwidth, during display the present invention is able to provide a relatively smooth transition between different versions of the base layer. That is, in conventional xe2x80x9csimulcastxe2x80x9d systems (i.e., systems such as this where a base layer has been coded at different bit rates), there is a substantial jump in image quality at the transition from a first bit rate to a second bit rate. The present invention, however, provides for a smoother transition by selecting and outputting fine-granular coded residual images between the different versions of the base layer.
According to another aspect, the present invention is a network system that includes an encoder which receives input video data and which outputs frames of coded video data therefrom, a variable-bandwidth network over which the frames of coded video data are transmitted, a decoder which receives the frames of coded video data from the variable-bandwidth network and which decodes the coded video data, and a display which displays the decoded video data. The encoder includes a processor and a memory which stores computer-executable process steps. The processor executes process steps stored in the memory so as to produce the frames of coded video data by (i) coding a base layer from the input video data using a frame-prediction coding technique, (ii) coding an enhancement layer from the input video data using a fine-granular scalability coding technique, (iii) determining a bandwidth of the variable-bandwidth network, and (iv) selecting, for output, the base layer and, in a case that the bandwidth of the variable-bandwidth network is greater than a predetermined value, a portion of the enhancement layer.
According to still another aspect, the present invention is a system for decoding video data comprised of an enhancement layer bitstream and a base layer bitstream, where the base layer bitstream is coded using a frame-prediction coding technique and the enhancement layer bitstream is encoded using a fine-granular scalability coding technique. The system receives the coded video data, decodes the base layer bitstream using a frame-prediction decoder, and decodes the enhancement layer bitstream using a fine-granular scalability decoder. Thereafter, the system combines (e.g., adds) decoded video data from the base layer bitstream and from the enhancement layer bitstream to form a video image.
According to still another aspect, the present invention is a system for coding video data and outputting coded video data to a plurality of receivers. The system codes a first portion of the video data using a frame-prediction coding technique to produce a first bitstream, and then codes a second portion of the video data using a fine-granular scalability coding technique to produce a second bitstream. The first bitstream is output to the plurality of receivers, whereafter the second bitstream is divided into two or more sub-streams. Finally, the two or more sub-streams are output to the plurality of receivers.
By virtue of the foregoing aspect of the invention, it is possible to multicast video data to a plurality of receivers. In other words, it is possible to broadcast coded data to the receivers at multiple bandwidths. These receivers may then accept only those bandwidths that they are able to process and/or receive. Thus, each receiver is able to receive and process as much data as it can handle, thereby resulting in more accurate image reproduction thereby.
This brief summary has been provided so that the nature of the invention may be understood quickly. A more complete understanding of the invention can be obtained by reference to the following detailed description of the preferred embodiments thereof in connection with the attached drawings.