Scalable video coding supports decoders with different capabilities. An encoder generates multiple encoded bitstreams for an input video. This is in contrast to single layer coding, which only uses one encoded bitstream for a video. In the scalable video coding, one of the output encoded bitstreams, referred to as the base layer (BL), can be decoded by itself and this encoded bitstream provides the lowest scalability level of the video output. To achieve a higher level of video output, the decoder can process the base layer bitstream together with other encoded bitstreams, referred to as enhancement layers (EL). The enhancement layer may be added to the base layer to generate higher scalability levels. One example is spatial scalability, where the base layer represents the lowest resolution video and the decoder can generate higher resolution video using the base layer bitstream together with additional enhancement layer bitstreams. Thus, using additional enhancement layer bitstreams produce a better quality video output, such as by achieving temporal, signal-to-noise ratio (SNR), and spatial improvements.
In a transmission model, such as a simulcast, a video on demand, or a streaming model, the encoder may transmit the video stream over various mediums to various decoders of different capabilities. Buffer management requires sending the compressed bytes of video data for each picture (also called access unit) into a video buffer whose size and input rate is defined by a video standard and a scheme where the picture data is removed at a specified time from the video buffer. Standards require that the video buffer never overflow (that is the video picture data is removed at correct time before new data enters the buffer thus making it overflow).
Decoders buffer the combined encoded bitstream before decoding unless only the base layer is being used. If only the base layer is being decoded, the decoder would buffer just the base layer. The buffer management may become difficult among decoders that are combining a different number of layers of the scalable video. For example, some of the decoders may request just the base layer, and some decoders may request the base layer and any number of enhancement layers. Decoders that request more than the base layer would combine the base layer and any enhancement layers, and then subsequently decode the combined bitstream. Because the buffers include different layers combined into encoded bitstreams, such as some buffers may include just the base layer, and other buffers may include a different number of layers (e.g., a base layer plus any number of enhancement layers), the management of the buffers may be difficult. For example, removal of the base layer plus enhancement layer data after combining them adds complexity and is a burden to the buffer management systems and many transport processing systems that are used for base layer processing need to be re-designed and modified. This also imposes a burden on other applications such as re-multiplexing and transcoding.
Furthermore, while the above relates to video encoding, similar problems exist for creating and managing MPEG-2 transport streams, which may include multiple streams including scalable video streams. MPEG-2 is the designation for a group of such standards, promulgated by the Moving Picture Experts Group (“MPEG”) as the ISO/TEC 13818 international standard. A typical use of MPEG-2 is to encode audio and video for broadcast signals, including signals transmitted by satellite and cable. Thus, MPEG-2 transport streams may be prone to buffering issues due to the multiple layers in a scalable video stream.