Efficient and reliable delivery of video data is becoming increasingly important as the Internet continues to grow in popularity. Video is very appealing because it offers a much richer user experience than static images and text. It is more interesting, for example, to watch a video clip of a winning touchdown or a Presidential speech than it is to read about the event in stark print. Unfortunately, video data requires significantly more memory and bandwidth than other data types commonly delivered over the Internet. As an example, one second of uncompressed video data may consume one or more Megabytes of data. Delivering such large amounts of data over error-prone networks, such as the Internet and wireless networks, presents difficult challenges in terms of efficiency, reliability, and network capacity.
Real-time delivery of video is often referred to as video streaming. To promote efficient delivery, video data is typically encoded prior to delivery to reduce the amount of data actually being transferred over the network. Image quality is lost as a result of the compression, but such loss is generally tolerated as necessary to achieve acceptable transfer speeds. In some cases, the loss of quality may not even be detectable to the viewer.
Video compression is well known. One common type of video compression is a motion-compensation-based video coding scheme, which is used in such coding standards as MPEG-1, MPEG-2, MPEG-4, H.261, and H.263. In such coding standards, video images are sampled and transformed into coefficients that more or less capture the variation in pixels across the image. The coefficients are then quantized and transmitted to a decoder. The decoder is able to decode the image by performing operations that are substantially the inverse of the encoding operations.
One particular type of motion-compensation-based video coding scheme is fine-granularity layered coding. Layered coding is a family of signal representation techniques in which the source information is partitioned into sets called “layers”. The layers are organized so that the lowest, or “base layer”, contains the minimum information for intelligibility. The base layer is typically encoded to fit in the minimum channel bandwidth. The goal is to deliver and decode at least the base layer to provide minimal quality video. The other layers, called “enhancement layers”, contain additional information that incrementally improves the overall quality of the video. With layered coding, lower layers of video data are often used to predict one or more higher layers of video data.
Another layered coding scheme is progressive FGS (PFGS). In PFGS, two reference images are constructed for each frame, one is the reconstruction image of the base layer, and the other is high quality reference image that is reconstructed using the base layer bitstream and a part of the enhancement layer bitstream. PFGS can improve coding efficiency over FGS because the prediction in PFGS is based on higher quality enhancement layers, rather than only the low quality base layer, as in FGS.
With layered coding, the various layers can be sent over the network as separate sub-streams, where the quality level of the video increases as each sub-stream is received and decoded. A decoder that receives the base layer and the enhancement layers can be configured to choose and decode a particular subset of these layers to get a particular quality according to its preference and capability.
Layered coding schemes are scalable, meaning that each layer can be scaled in one or more aspects to achieve various desired performance goals. Spatial scalability refers to approaches in which an image is decomposed into layers at different spatial resolutions. Signal-to-noise (SNR) ratio scalability refers to approaches in which the same spatial resolution is applied to the layers, but coefficients are quantized at increasingly higher granularities.
While scalability can improve the visual quality of video, serious problems, such as drifting, can occur. Drifting refers to a situation in which reference images at the encoder and decoder do not match. In addition, coding efficiency can be reduced when network bandwidth fluctuations are large. For example, when a scalable video codec is set to optimize coding performance at a low bit-rate, often the performance at high bit-rate will be sacrificed.