Video data requires a lot of storage space to store or a wide bandwidth to transmit. Along with the growing high resolution and higher frame rates, the storage or transmission bandwidth requirements would be formidable if the video data is stored or transmitted in an uncompressed form. Therefore, video data is often stored or transmitted in a compressed format using video coding techniques. The coding efficiency has been substantially improved using newer video compression formats such as H.264/AVC and the emerging HEVC (High Efficiency Video Coding) standard.
FIG. 1 illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing. For Inter-prediction, Motion Estimation (ME)/Motion Compensation (MC) 112 is used to provide prediction data based on video data from other picture or pictures. Switch 114 selects Intra Prediction 110 or Inter-prediction data and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data are stored in Reference Picture Buffer 134 and used for prediction of other frames. However, loop filter 130 (e.g. deblocking filter and/or sample adaptive offset, SAO) may be applied to the reconstructed video data before the video data are stored in the reference picture buffer.
FIG. 2 illustrates a system block diagram of a corresponding video decoder for the encoder system in FIG. 1. Since the encoder also contains a local decoder for reconstructing the video data, some decoder components are already used in the encoder except for the entropy decoder 210. Furthermore, only motion compensation 220 is required for the decoder side. The switch 146 selects Intra-prediction or Inter-prediction and the selected prediction data are supplied to reconstruction (REC) 128 to be combined with recovered residues. Besides performing entropy decoding on compressed residues, entropy decoder 210 is also responsible for entropy decoding of side information and provides the side information to respective blocks. For example, Intra mode information is provided to Intra-prediction 110, Inter mode information is provided to motion compensation 220, loop filter information is provided to loop filter 130 and residues are provided to inverse quantization (IQ) 124. The residues are processed by IQ 124, IT 126 and subsequent reconstruction process to reconstruct the video data. Again, reconstructed video data from REC 128 undergo a series of processing including IQ 124 and IT 126 as shown in FIG. 2 and are subject to coding artefacts. The reconstructed video data are further processed by Loop filter 130.
In the High Efficiency Video Coding (HEVC) system, the fixed-size macroblock of H.264/AVC is replaced by a flexible block, named coding unit (CU). Pixels in the CU share the same coding parameters to improve coding efficiency. A CU may begin with a largest CU (LCU), which is also referred as coded tree unit (CTU) in HEVC. In addition to the concept of coding unit, the concept of prediction unit (PU) is also introduced in HEVC. Once the splitting of CU hierarchical tree is done, each leaf CU is further split into one or more prediction units (PUs) according to prediction type and PU partition. Furthermore, the basic unit for transform coding is square size named Transform Unit (TU). A Coding Group (CG) is defined as a set of 16 consecutive coefficients in scan order.
In image and video coding, various syntax elements related to residuals or side information are generated. These syntax elements are coded and incorporated into the video bitstream. In order to code these syntax elements efficiently, entropy coding is often used to code the syntax element or some of these syntax element. In the conventional context modelling, the context modelling is always based on coded bins, which cause data dependency during the coding process. It is desirable to develop new context modelling that will not have such data dependency.
In the following, an example of context modelling is illustrated based on the transform coefficient coding in the existing HEVC standard. The context modelling for the transform coefficients is intended to demonstrate issues in existing context modelling. The scope that the present invention intends to address includes image and video data in general instead of the transform coefficients specifically.
For a given scan order, a CG corresponds to a 4×4 sub-block. A syntax element coded_sub_block_flag is signalled for each to indicate whether the sub-block contains non-zero coefficients. If the sub-block is significant as indicated by the corresponding flag, then the coefficient significant flag, sign flag, and absolute level of the sub-block are further coded by up to five coefficient scan paths. Each coefficient scan path codes a syntax element within a CG, when necessary, as follows:                1) significant_coeff_flag: significance of a coefficient (zero/non-zero)        2) coeff_abs_level_greater1_flag: a flag indicating whether the absolute value of a coefficient level is greater than 1.        3) coeff_abs_level_greater2_flag: a flag indicating whether the absolute value of a coefficient level is greater than 2.        4) coeff_sign_flag: a sign of a significant coefficient (0: positive, 1: negative)        5) coeff_abs_level_remaining: the remaining value for absolute value of a coefficient level (if value is larger than that coded in previous passes).        
The bins in the first 3 passes are arithmetically coded in the regular mode (use context) and the bins in scan paths 4 and 5 are arithmetically coded in the bypass mode. Grouping bypass bins can increase the throughput of the entropy coder.
In the current HEVC standard, residuals in a TU is coded in the CG basis and the CGs are coded one by one according to CG scan path, where the CG scan path refers to the scan order for the CGs within a TU. Therefore, while the bypass bins within a CG are grouped together, the regular mode bins and bypass bins in a TU are still interleaved.
For each CG, depending on a criterion, coding the sign of the last non-zero coefficient is omitted when sign data hiding is applied. The sign value is derived by the parity of the sum of the levels of the CG, where an even parity corresponds to the positive sign and an odd parity corresponds to the negative sign. The criterion is the distance in scan order between the first and last non-zero coefficients. If the distance is larger than a threshold (i.e., 4 in HEVC), then sign data hiding is applied.
The context model of significant_coeff_flag for a 4×4 TB (Transform Block) depends on the position of the coefficient within the TB. Coefficient positions are grouped according to their frequency and the significant flags within a group are coded using the same context. FIG. 3 shows the context modelling for a 4×4 TB 300, where the number in each small square corresponds to the context index of a corresponding coefficient. Luma and chroma components are treated in the same way, but use separate context sets. For 8×8, 16×16 and 32×32 TBs, context modelling is based on both position and template. As shown in FIG. 4, a context is selected depending on a template of the neighbouring right and lower CSBF (Coded Sub Block Flag), sr and sl. For TB 410, CSBF sr is equal 0 and sl is equal to 0. For TB 420, CSBF sr is equal 1 and sl is equal to 0. For TB 430, CSBF sr is equal 0 and sl is equal to 1. For TB 440, CSBF sr is equal 1 and sl is equal to 1. Besides, TBs are split into two regions, the top leftmost sub-block is region 1, and the rest of sub-blocks make up region 2. For luma component, region 1 and region 2 use separate context sets. For chroma components, contexts for region 1 and region 2 are shared. The DC component has a single dedicated context and is shared across all TB sizes.
When coding the absolute level, there are 4 different context sets for the luma component and 2 context sets for the chroma component. Each set has 4 context models for coeff_abs_level_greater1_flag and 1 context for coeff_abs_level_greater2_flag. A context set is selected depending on whether there is a coeff_abs_level_greater1_flag equal to 1 in the previous CG. For the luma component, the context set selection also depends on whether DC coefficient is part of current CG. For the chroma component, the context set does not depend on the CG location. The specific context within a context set for coeff_abs_level_greater1_flag is selected depending on the number of trailing zeros and the number of coefficient levels larger than 1 in the current CG.
During the HEVC development, other context modelling methods have also been disclosed. For example, Nguyen et al., discloses three context sets depending on regions for significant_coeff_flag in JCTVC-H0228 (Non-CE11: Proposed Cleanup for Transform Coefficient Coding, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 8th Meeting: San Jose, Calif., USA, 1-10 Feb. 2012, Document: JCTVC-H0228). The first region (i.e., Set 0) includes all positions with posX+posY<2, where posX and posY are the horizontal and vertical indices of the coefficient matrix with (0,0) being the upper left corner location. The second region (i.e., Set 1) includes all positions with posX+posY<5 and not in Set 0. The remaining positions belong to the third region (i.e., Set 2). FIG. 5 illustrates the three regions according to JCTVC-H0228. The 4×4 and 8×8 TU use separate context sets, and others share the same context. The context selection is based neighbouring samples within a “local template”. According to JCTVC-H0228, the local template 520 for a current sample location 510 includes 5 neighbouring samples (i.e., samples shown as slant-lined squares) in the local template as shown in FIG. 5. Up to 6 contexts are used depending on the statistics of the levels of samples in the local template. The context modelling for significant_coeff_flag according to JCTVC-H-0228 has very high complexity. Also it is interleaved with bypass bins.
In Key Technical Areas (KTA) for HEVC proposed by Qualcomm (Further improvements to HMKTA-1.0, VCEG-AZ07, June 2015), the context modelling for significant_coeff_flag is modified so that significant_coeff_flag is not interleaved with the bypass bins. While up to 6 contexts may be used, the contexts are selected depending on the statistics of the bins of samples in the local template.
From the above descriptions, the context model derivation for significant_coeff_flag can be done in parallel for all samples in one CG. However, the context model derivation for syntax element coeff_abs_level_greater1_flag depends on previously decoded bins of the same syntax element. In this example, it illustrates data dependency in existing entropy modelling.
The context Sets for a CG for coeff_abs_level_greater1_flag according to HEVC is shown in Table 1. The meaning of the context model is shown in Table 2.
TABLE 1Context Sets for a CGLumaChroma# of coeff_abs_level_greater1_flag = 1 in0>00>0previous CGRegion 1 (top left CG)0145Region 2 (other CGs)2345
TABLE 2Context ModelDescription01 or more larger than 11Initial - no trailing zeros21 trailing zero32 or more trailing zeros
The context Sets for a CG for coeff_abs_level_greater2_flag according to HEVC is shown in Table 3. The meaning of the context model is shown in Table 2. There is only 1 context for each set.
TABLE 3Context Sets for a CGLumaChroma# of coeff_abs_level_greater1_flag = 1 in0>00>0previous CGRegion 1 (top left CG)0145Region 2 (other CGs)2345
It is desirable to develop context model that can remove certain data dependency. In the example of transform coefficient coding, it is desirable to develop context model for coeff_abs_level_greater1_flag and other coefficient parameters so that the context model does not depend on previously decoded bins of the same syntax element.