Field of the Invention
Embodiments of the present invention generally relate to video coding and more specifically relate to signaling decoded picture buffer size in multi-loop scalable video coding.
Description of the Related Art
The demand for digital video products continues to increase. Some examples of applications for digital video include video communication (e.g., video conferencing and multimedia messaging), security and surveillance, industrial automation, and entertainment (e.g., DV, HDTV, satellite TV, set-top boxes, Internet video streaming, video gaming devices, digital cameras, cellular telephones, video jukeboxes, high-end displays and personal video recorders). Further, video applications are becoming increasingly mobile as a result of higher computation power in handsets, advances in battery technology, and high-speed wireless connectivity.
Video transmission systems using the internet and mobile networks have a wide range of receiving devices, i.e., video endpoints, ranging, for example, from cellular telephones with small screens to tablet computers to personal computers with high definition displays to video conferencing systems with large screens. That is, the devices receiving a video transmission may have different resolution, frame rate, and bandwidth capabilities. Scalable video coding (SVC) is one technique that may be used to allow a video to be received by a range of receiving devices according to the capabilities of each device. In general, SVC refers to encoding a video as a single scalable video bitstream with one or more subset bitstreams that are adapted to varying video endpoint capabilities, network conditions, and/or user preferences.
A video bitstream may be referred to as scalable when parts of the stream can be removed such that the resulting subset bitstream is a valid bitstream for some target decoder, and the subset bitstream represents the original video content with a reconstruction quality that is less than that of the complete original bitstream but is high in view of the lower quantity of data in the subset bitstream. Typically, three scalability modes are considered: temporal, spatial, and quality. A spatially scaled subset bitstream represents the original video content at a reduced picture size. A temporally scaled subset bitstream represents the original video content at a reduced frame rate. A quality scaled subset bitstream represents the original video content at the same spatial and temporal resolution as the complete bitstream but at a lower quality, i.e., signal-to-noise ratio (SNR).
In scalable video coding, a single encoded bitstream, which may be referred to as a scalable bitstream herein, may include multiple layers (sub-bitstreams) of compressed video data. The base layer is the most basic, scaled down compressed data needed to reconstruct the video stream at the lowest spatial resolution, temporal resolution, and/or quality. The remaining compressed video data in the scalable bitstream is grouped into one or more enhancement layers. Each enhancement layer “builds” on the layer or layers below and includes video data that a decoder can use (in conjunction with data from the lower layer or layers) to generate an enhanced version of the video stream. Thus, the architecture of a video encoder that generates a scalable video bitstream may include a base layer encoder and one or more enhancement layer encoders. Similarly, the architecture of a video decoder that decodes a scalable video bitstream may include a base layer decoder and one or more enhancement layer decoders.
H.264/SVC is an example of a video coding standard that provides scalable video coding. More specifically, H.264/SVC is a scalable video coding (SVC) extension of H.264/AVC that supports temporal, spatial and quality scalability functions. A summary of H.264/SVC is presented in H. Schwarz, et al., “Overview of The Scalable Video Coding Extension of the H.264/SVC Standard,” IEEE Trans. Circuits and Systems, vol. 17, No. 9, September 2007. The temporal scalability of H.264/SVC allows decoding of a bitstream at different frame rates by partitioning a set of pictures into a temporal base layer bitstream and temporal enhancement layer bitstreams. The spatial scalability and quality scalability of H.264/SVC allow encoding of video at different resolutions and qualities as a base layer bitstream and one or more enhancement layer bitstreams.
In general, a scalable video codec may be based on either a multi-loop architecture or a single loop architecture. In a single loop architecture, which is used in H.264/SVC, a full decoding loop takes place only in the target layer. Inter-coded blocks in intermediate layers are not reconstructed and sophisticated inter-layer prediction techniques such as residual prediction and motion prediction are used. In a multi-loop architecture, a full encoding/decoding loop is performed in every layer needed to encode/decode a target layer, thus avoiding the need for the complex inter-layer prediction techniques. Both intra- and inter-coded blocks are fully reconstructed in all layers and the reconstructed samples from lower layers may be used as reference samples for higher layers. The scalable extension currently under development by Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T WP3/16 and ISO/IEC JTC 1/SC 29/WG 11 for the recently completed first version of the High Efficiency Video Coding (HEVC) standard is based on a multi-loop architecture.