Video streaming has become a mainstream for video delivery today. Supported by the high-speed ubiquitous internet as well as mobile networks, video contents can be delivered to end users for viewing on different platforms with different qualities. In order to fulfill different requirements for various video stream applications, a video source may have to be processed or stored at different resolutions, frame rates, and/or qualities. It would result in fairly complicated system and require high overall bandwidth or large overall storage space. One solution to satisfy requirements for different resolutions, frame rates, qualities and/or bitrates is scalable video coding. Beside various proprietary development efforts to address this problem, there is also an existing video standard for scalable video coding. The joint video team (JVT) of ISO/IEC MPEG and ITU-T VCEG has standardized a Scalable Video Coding (SVC) extension to the H.264/AVC standard. An H.264/AVC SVC bitstream can contain video information ranging from low frame-rate, low resolution and low quality to high frame rate, high definition and high quality. This single bitstream can be adapted to a specific application by properly configuring the scalability of the bitstream. For example, the complete bitstream corresponding to a high definition video can be delivered over high-speed networks to provide full quality intended for viewing on large screen TV. A portion of the bitstream corresponding to a low-resolution version of the high definition video can be delivered over legacy cellular networks for intended viewing on handheld/mobile devices. Accordingly, a bitstream generated using H.264/AVC SVC is suitable for various video applications such as video broadcasting, video streaming, and surveillance.
In SVC, three types of scalabilities, i.e., temporal scalability, spatial scalability, and quality scalability are provided. SVC uses a multi-layer coding structure to render three dimensions of scalability. The concept of SVC is to generate one scalable bitstream that can be easily and quickly adapted to fit the bit-rate of various transmission channels, diverse display capabilities, and/or different computational resources without the need of transcoding or re-encoding. An important feature of SVC design is to provide scalability at the bitstream level. Bitstreams for a reduced spatial and/or temporal resolution can be simply obtained by discarding NAL units (or network packets) that are not required for decoding the target resolution. NAL units for quality refinement can be additionally truncated in order to reduce the bit-rate and/or the corresponding video quality.
In the H.264/AVC SVC extension, spatial scalability is supported based on the pyramid coding. First, the video sequence is down-sampled to smaller pictures with different spatial resolutions (layers). The lowest layer (i.e., the layer with lowest spatial resolution) is called a base layer (BL). Any layer above the base layer is called an enhancement layer (EL). In addition to dyadic spatial resolution, the H.264/AVC SVC extension also supports arbitrary resolution ratios, which is called extended spatial scalability (ESS). In order to improve the coding efficiency of the enhancement layers (video layers with larger resolutions), various inter-layer prediction schemes have been disclosed in the literature. Three inter-layer prediction tools have been adopted in SVC, including inter-layer motion prediction, inter-layer Intra prediction and inter-layer residual prediction (e.g., C. Andrew Segall and Gary J. Sullivan, “Spatial Scalability Within the H.264/AVC Scalable Video Coding Extension”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 9, Pages 1121-1135, September 2007).
FIG. 1 illustrates an example of spatial scalability design according to H.264/AVC SVC. Base layer encoder 110 receives a lower resolution video sequence as input and encodes the low-resolution sequence using conventional H.264/AVC video coding. Coding mode selection 112 can select a prediction mode between Intra-prediction and motion-compensated Inter-prediction. Enhancement layer encoder 120 receives a higher resolution sequence as input. The higher resolution sequence can be encoded with a structure similar to the conventional H.264/AVC coding. However, inter-layer prediction 130 can be used as an additional coding mode. Accordingly, mode selection 122 for the enhancement layer can select a prediction mode among Intra-prediction, motion-compensated Inter-prediction and inter-layer prediction. For the case of Intra-coded blocks in the base layer, reconstructed blocks provide a prediction for the enhancement layer. For the case of Inter-coded blocks in the base layer, motion vectors and residual difference information of the base layer can be used to predict those of the enhancement layer. While two resolution layers are shown in FIG. 1 as an example of spatial scalability according to H.264/AVC SVC, more resolution layers can be added, which a higher-resolution enhancement layer can use either the base layer or previously transmitted enhancement layers for inter-layer prediction. Furthermore, other forms of SVC enhancement (e.g., temporal or quality) may also be present in the system.
In H.264/AVC SVC, the reconstructed blocks, motion vectors, or residual information associated with lower layers are used for inter-layer coding. It is desirable to utilize other coding information associated with lower layers to further improve coding efficiency and/or reduce system complexity.
HEVC (High Efficiency Video Coding) is an advanced video coding system being developed under the Joint Collaborative Team on Video Coding (JCT-VC) group of video coding experts from ITU-T Study Group. In HEVC Test Model Version 6.0 (HM-6.0), the prediction unit (PU) for Intra coding can be 64×64, 32×32, 16×16, 8×8, or 4×4. A total of 35 Intra prediction modes, i.e., mode 0 to mode 34 are used for all PU sizes as shown in FIG. 2A. In addition, mode 35 (i.e., Intra_FromLuma) is only used for the chroma component when chroma Intra prediction based on luma Intra prediction is allowed (i.e., when chroma_pred_from_luma_enabled_flag=1). The Intra prediction mode is also called Intra mode in this disclosure.
For Intra prediction mode coding of the luma component in HM-6.0, three most probable modes (denoted as candModeList[x], x=0 to 2) are derived for a current luma PU, 210 based on the Intra modes of neighboring PUs (220 and 230) as shown in FIG. 2B. The Intra mode of the left PU, 220 (denoted as candIntraPredModeA) and the Intra mode of the top PU, 230 (denoted as candIntraPredModeB) are used to derive the most probable modes, candModeList[x], as follows:                If candIntraPredModeB is equal to candIntraPredModeA, the following applies:                    If candIntraPredModeA is less than 2 (i.e., Intra_Planar or Intra_DC mode), candModeList[x], x=0 to 2 is derived as:                            candModeList[0]=Intra_Planar                candModeList[1]=Intra_DC                candModeList[2]=Intra_Angular (26)                                    Otherwise, candModeList[x], x=0 to 2 is derived as:                            candModeList[0]=candIntraPredModeA                candModeList[1]=2+((candIntraPredModeA−2−1) % 32                candModeList[2]=2+((candIntraPredModeA−2+1) % 32                                                Otherwise (candIntraPredModeB is not equal to candIntraPredModeA), the following applies:                    candModeList[0] and candModeList[1] are derived as follows:                            candModeList[0]=candIntraPredModeA                candModeList[1]=candIntraPredModeB                                    If none of candModeList[0] and candModeList[1] is equal to Intra_Planar, candModeList[2] is set equal to Intra_Planar,            Otherwise,                            If none of candModeList[0] and candModeList[1] is equal to Intra_DC, candModeList[2] is set equal to Intra_DC,                Otherwise, candModeList[2] is set equal to Intra_Angular (26).                                                
If the neighboring PU adjacent to the left side or the top side of the current PU is not available or is not Intra coded, candIntraPredModeA or candIntraPredModeB is set to Intra_DC. Intra_Planar and Intra_DC correspond to the Planar mode and the DC mode respectively. Therefore, Intra_Planar/Intra_DC and the Planar mode/DC mode are used interchangeably in this disclosure.
If the current Intra mode (denoted as IntraPredMode) is equal to any of the most probable modes in candModeList, the value of flag prev_intra_pred_flag is set to 1 to indicate the case. An index is then sent to identify the mode (i.e., IntraPredMode) in candModeList. If the current Intra mode is not equal to any of the most probable modes in candModeList, then the current Intra mode is among one of the remaining modes. The remaining mode (denoted as rem_intra_luma_pred_mode) that is equal to IntraPredMode is transmitted to identify IntraPredMode. The occurrence of the remaining mode is indicated by a preceding flag, prev_intra_pred_flag having a value equal to 0.
In HM-6.0, Intra mode coding for the chroma PU may have 5 or 6 candidate modes depending on chroma_pred_from_luma_enabled_flag. If chroma_pred_from_luma_enabled_flag is equal to 1, 6 possible chroma modes are used as shown in Table 1. Otherwise, 5 candidate modes are used as shown in Table 2.
TABLE 1IntraPredModeX (0 <=intra_chroma_pred_mode026101X < 35)034 0 0 0 012634262626210103410103 1 1 134 14LMLMLMLMLM5 02610 1X
TABLE 2IntraPredModeintra_chroma_pred_mode026101X0340000126342626262101034101031113414026101X
To support spatial scalability for blocks coded in Intra mode, the scalable extension of H.264/AVC includes a new Intra coding type (denoted as Intra_BL) for Intra-coded macroblocks based inter-layer information from a base layer. When a macroblock in a higher layer is coded according to the Intra_BL coding type and the co-located 8×8 sub-macroblock in its reference layer is Intra coded, the prediction signal for the macroblock in higher layer can be derived from the corresponding reconstructed block in the reference layer. The 8×8 sub-macroblock in the reference layer is up-sampled to generate the prediction.
A scalable extension to HEVC is being developed by the Joint Collaborative Team on Video Coding (JCT-VC) group of video coding experts from ITU-T Study Group. An inter-layer texture prediction technique similar to Intra_BL of H.264/AVC SVC can be applied to coding the enhancement layer picture. A coding tool, the inter-layer differential coding method was also proposed to the scalable extension of HEVC (referred as SHVC) where the residues between the original video data at an enhancement layer and the reconstructed base layer is compressed using both motion-compensated Inter coding and Intra prediction coding methods. The reconstructed base layer has to be up-sampled or scaled to the same spatial resolution as the original video data at an enhancement layer in order to form the residues at the enhancement layer. In this case, the reference samples used for motion compensation or Intra prediction are the reconstructed residues between the already reconstructed video data at the enhancement layer and the corresponding reconstructed video data at the base layer. Again, the reconstructed video data at the base layer has to be up-sampled or scaled to match the spatial resolution of the reconstructed video data at the enhancement layer.
An exemplary two-layer HEVC SVC system is shown in FIG. 3. A conventional HEVC coder can be used as the base layer coder (310), where motion compensated Prediction (M.C. Pred.) 312 is used for inter-frame coding. The coder for the enhancement layer (320) is similar to the conventional HEVC coder except that the M.C./Inter-layer Prediction 322 also supports inter-layer prediction in addition to Inter-frame prediction. The input to base layer coder 310 is derived from the higher-resolution input video data by applying spatial decimation 330 to the higher-resolution input video data. On the other hand, in order to use the reconstructed base layer data for inter-layer prediction, the reconstructed base layer data needs to be up-sampled or scaled by up-sampling 324.
It is desirable to improve the coding efficiency of a SVC system or reducing the complexity of the SVC system by exploiting the correlation associated with the Intra modes from different layers without causing any noticeable impact on video quality or performance.
In SVC, the enhancement layer (EL) can reuse the motion information in the base layer (BL) to reduce the inter-layer motion data redundancy as mentioned before. In EL macroblock coding, a flag, base_mode_flag can be coded before mb_type to indicate whether the EL motion information is directly derived from the BL or not. If base_mode_flag is equal to 1, the partitioning data of the EL macroblock along with the associated reference indexes and motion vectors can be derived from the corresponding data of the co-located 8×8 block in the BL. The reference index of BL can be directly used in the EL. The macroblock partitioning and motion vectors of the EL can be determined based on the scaled data of macroblock partitioning and motion vectors of the BL. In addition, the scaled BL motion vector can be used as an additional motion vector predictor for the EL.
Inter-layer residual coding can use the up-sampled BL residual information as prediction to reduce the information required for the EL residual. The co-located residual of the BL can be block-wise up-sampled using a bilinear filter and the up-scaled block can be used as prediction for the residual of current macroblock in the EL. The up-sampling of the reference layer residual can be performed on a transform block basis in order to ensure that no filtering is applied across transform block boundaries.
The inter-layer texture prediction reduces the redundant texture information of the EL. The prediction for the EL is generated by block-wise up-sampling the co-located reconstructed BL block. In the up-sampling process for the inter-layer texture prediction, a 4-tap and 2-tap FIR filter can be applied to the luma and chroma components respectively. Unlike filtering for the inter-layer residual prediction, filtering is always performed across sub-block boundaries for the inter-layer texture prediction. For decoding simplicity, inter-layer texture prediction can be restricted to Intra-coded macroblocks in the BL.
While inter-layer residual prediction and inter-layer texture prediction have been used for SVC, it is desirable to further improve the performance.