Video streaming has become a mainstream for video delivery today. Supported by the high-speed ubiquitous internet as well as mobile networks, video contents can be delivered to end users for viewing on different platforms with different qualities. In order to fulfill different requirements for various video stream applications, a video source may have to be processed or stored at different resolutions, frame rates, and/or qualities. It would result in fairly complicated system and require high overall bandwidth or large overall storage space. One solution to satisfy requirements for different resolutions, frame rates, qualities and/or bitrates is scalable video coding. Beside various proprietary development efforts to address this problem, there is also an existing video standard for scalable video coding. The joint video team (JVT) of ISO/IEC MPEG and ITU-T VCEG has standardized a Scalable Video Coding (SVC) extension to the H.264/AVC standard. An H.264/AVC SVC bitstream can contain video information ranging from low frame-rate, low resolution and low quality to high frame rate, high definition and high quality. This single bitstream can be adapted to a specific application by properly configuring the scalability of the bitstream. For example, the complete bitstream corresponding to a high definition video can be delivered over high-speed networks to provide full quality intended for viewing on large screen TV. A portion of the bitstream corresponding to a low-resolution version of the high definition video can be delivered over legacy cellular networks for intended viewing on handheld/mobile devices. Accordingly, a bitstream generated using H.264/AVC SVC is suitable for various video applications such as video broadcasting, video streaming, and surveillance.
In SVC, three types of scalabilities, i.e., temporal scalability, spatial scalability, and quality scalability are provided. SVC uses a multi-layer coding structure to render three dimensions of scalability. The concept of SVC is to generate one scalable bitstream that can be easily and quickly adapted to fit the bit-rate of various transmission channels, diverse display capabilities, and/or different computational resources without the need of transcoding or re-encoding. An important feature of SVC design is to provide scalability at the bitstream level. Bitstreams for a reduced spatial and/or temporal resolution can be simply obtained by discarding NAL units (or network packets) that are not required for decoding the target resolution. NAL units for quality refinement can be additionally truncated in order to reduce the bit-rate and/or the corresponding video quality.
In the H.264/AVC SVC extension, spatial scalability is supported based on the pyramid coding. First, the video sequence is down-sampled to smaller pictures with different spatial resolutions (layers). The lowest layer (i.e., the layer with lowest spatial resolution) is called a base layer (BL). Any layer above the base layer is called an enhancement layer (EL). In addition to dyadic spatial resolution, the H.264/AVC SVC extension also supports arbitrary resolution ratios, which is called extended spatial scalability (ESS). In order to improve the coding efficiency of the enhancement layers (video layers with larger resolutions), various inter-layer prediction schemes have been disclosed in the literature. Three inter-layer prediction tools have been adopted in SVC, including inter-layer motion prediction, inter-layer Intra prediction and inter-layer residual prediction (e.g., C. Andrew Segall and Gary J. Sullivan, “Spatial Scalability Within the H.264/AVC Scalable Video Coding Extension”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 9, Pages 1121-1135, September 2007).
FIG. 1 illustrates an example of spatial scalability design according to H.264/AVC SVC. Base layer encoder 110 receives a lower resolution video sequence as input and encodes the low-resolution sequence using conventional H.264/AVC video coding. Coding mode selection 112 can select a prediction mode between Intra-prediction and motion-compensated Inter-prediction. Enhancement layer encoder 120 receives a higher resolution sequence as input. The higher resolution sequence can be encoded with a structure similar to the conventional H.264/AVC coding. However, inter-layer prediction 130 can be used as an additional coding mode. Accordingly, mode selection 122 for the enhancement layer can select a prediction mode among Intra-prediction, motion-compensated Inter-prediction and inter-layer prediction. For the case of Intra-coded blocks in the base layer, reconstructed blocks provide a prediction for the enhancement layer. For the case of Inter-coded blocks in the base layer, motion vectors and residual difference information of the base layer can be used to predict those of the enhancement layer. While two resolution layers are shown in FIG. 1 as an example of spatial scalability according to H.264/AVC SVC, more resolution layers can be added, which a higher-resolution enhancement layer can use either the base layer or previously transmitted enhancement layers for inter-layer prediction. Furthermore, other forms of SVC enhancement (e.g., temporal or quality) may also be present in the system.
HEVC (High Efficiency Video Coding) is an advanced video coding system developed under the Joint Collaborative Team on Video Coding (JCT-VC) group of video coding experts from ITU-T Study Group. HEVC utilizes very flexible data structure including coding unit (CU), prediction unit (PU) and transform unit (TU). The CU, PU and TU can be partitioned into smaller blocks. Usually, rate-distortion cost is used to select best partitions for the CUs, PUs and TUs. The scalable coding system may also be based on HEVC and the HEVC-based scalable video coding system is named SHVC.
Both AVC/H.264 and HEVC are block-based coding system, where the picture is divided into coding blocks. For AVC/H.264, the picture is divided into macroblocks and each luma macroblock (i.e., Y component) consists of 16×16 pixels. For HEVC, the picture is divided into largest coding units (LCUs) and each CU may be further partitioned into smaller CUs until the smallest CU is reached. Each macroblock or CU is then predicted using Inter or Intra prediction to generate residues for the macroblock or CU. The residues of the macroblock or CU are divided into TUs and each TU is processed by two-dimensional transform. The transform coefficients of each TU are quantized using a quantization matrix. The quantized transform coefficients are coded using entropy coding to form part of the coded bitstream.
Information associated with the quantization matrices, also called scaling list data, usually is incorporated in the coded bitstream so that a decoder can apply inverse quantization accordingly. For AVC, the scaling list data is provided for block sizes 4×4 and 8×8, Inter and Intra prediction modes, and different color components (i.e., Y, Cb and Cr) individually. For HEVC, the scaling list data is provided for block sizes 4×4 and 8×8 similar to AVC. In addition, the scaling list data for block sizes 16×16 and 32×32 is also provided. For 16×16, the scaling list data is provided for Inter and Intra prediction modes and color components Y, Cb and Cr individually, where the 16×16 matrices are up-sampled from the corresponding 8×8 matrices. For 32×32, the scaling list data is provided for Inter and Intra prediction modes and Y component individually, where the 32×32 matrices are up-sampled from the corresponding 8×8 matrices.
For scalable system based on SVC, the set of quantization matrices similar to AVC are signalled for each layer. For scalable system based on SHVC, the set of quantization matrices similar to HEVC are signalled for each layer. Therefore, the scaling list data grows with the number of layers. In a multi-view coding system, the scaling list data may have to be signalled for each view and the scaling list data grows with the number of views. It is desirable to reduce the required scaling list data for a scalable system or a three-dimensional/multi-view system.