The present application is directed to coding of significance maps indicating positions of significant transform coefficients within transform coefficient blocks and the coding of such transform coefficient blocks. Such coding may, for example, be used in picture and video coding, for example.
In conventional video coding, the pictures of a video sequence are usually decomposed into blocks. The blocks or the color components of the blocks are predicted by either motion-compensated prediction or intra prediction. The blocks can have different sizes and can be either quadratic or rectangular. All samples of a block or a color component of a block are predicted using the same set of prediction parameters, such as reference indices (identifying a reference picture in the already coded set of pictures), motion parameters (specifying a measure for the movement of a blocks between a reference picture and the current picture), parameters for specifying the interpolation filter, intra prediction modes, etc. The motion parameters can be represented by displacement vectors with a horizontal and vertical component or by higher order motion parameters such as affine motion parameters consisting of 6 components. It is also possible that more than one set of prediction parameters (such as reference indices and motion parameters) are associated with a single block. In that case, for each set of prediction parameters, a single intermediate prediction signal for the block or the color component of a block is generated, and the final prediction signal is build by a weighted sum of the intermediate prediction signals. The weighting parameters and potentially also a constant offset (which is added to the weighted sum) can either be fixed for a picture, or a reference picture, or a set of reference pictures, or they can be included in the set of prediction parameters for the corresponding block. Similarly, still images are also often decomposed into blocks, and the blocks are predicted by an intra prediction method (which can be a spatial intra prediction method or a simple intra prediction method that predicts the DC component of the block). In a comer case, the prediction signal can also be zero.
The difference between the original blocks or the color components of the original blocks and the corresponding prediction signals, also referred to as the residual signal, is usually transformed and quantized. A two-dimensional transform is applied to the residual signal and the resulting transform coefficients are quantized. For this transform coding, the blocks or the color components of the blocks, for which a particular set of prediction parameters has been used, can be further split before applying the transform. The transform blocks can be equal to or smaller than the blocks that are used for prediction. It is also possible that a transform block includes more than one of the blocks that are used for prediction. Different transform blocks in a still image or a picture of a video sequence can have different sizes and the transform blocks can represent quadratic or rectangular blocks.
The resulting quantized transform coefficients, also referred to as transform coefficient levels, are then transmitted using entropy coding techniques. Therefore, a block of transform coefficients levels is usually mapped onto a vector (i.e., an ordered set) of transform coefficient values using a scan, where different scans can be used for different blocks. Often a zig-zag scan is used. For blocks that contain only samples of one field of an interlaced frame (these blocks can be blocks in coded fields or field blocks in coded frames), it is also common to use a different scan specifically designed for field blocks. A commonly used entropy coding algorithm for encoding the resulting ordered sequence of transform coefficients is run-level coding. Usually, a large number of the transform coefficient levels is zero, and a set of successive transform coefficient levels that are equal to zero can be efficiently represented by coding the number of successive transform coefficient levels that are equal to zero (the run). For the remaining (non-zero) transform coefficients, the actual level is coded. There are various alternatives of run-level codes. The run before a non-zero coefficient and the level of the non-zero transform coefficient can be coded together using a single symbol or code word. Often, special symbols for the end-of-block, which is sent after the last non-zero transform coefficient, are included. Or it is possible to first encode the number of non-zero transform coefficient levels, and depending on this number, the levels and runs are coded.
A somewhat different approach is used in the highly efficient CABAC entropy coding in H.264. Here, the coding of transform coefficient levels is split into three steps. In the first step, a binary syntax element coded_block_flag is transmitted for each transform block, which signals whether the transform block contains significant transform coefficient levels (i.e., transform coefficients that are non-zero). If this syntax element indicates that significant transform coefficient levels are present, a binary-valued significance map is coded, which specifies which of the transform coefficient levels have non-zero values. And then, in a reverse scan order, the values of the non-zero transform coefficient levels are coded. The significance map is coded as follows. For each coefficient in the scan order, a binary syntax element significant_coeff_flag is coded, which specifies whether the corresponding transform coefficient level is not equal to zero. If the significant_coeff_flag bin is equal to one, i.e., if a non-zero transform coefficient level exists at this scanning position, a further binary syntax element last_significant_coeff_flag is coded. This bin indicates if the current significant transform coefficient level is the last significant transform coefficient level inside the block or if further significant transform coefficient levels follow in scanning order. If last_significant_coeff_flag indicates that no further significant transform coefficients follow, no further syntax elements are coded for specifying the significance map for the block. In the next step, the values of the significant transform coefficient levels are coded, whose locations inside the block are already determined by the significance map. The values of significant transform coefficient levels are coded in reverse scanning order by using the following three syntax elements. The binary syntax element coeff_abs_greater_one indicates if the absolute value of the significant transform coefficient level is greater than one. If the binary syntax element coeff_abs_greater_one indicates that the absolute value is greater than one, a further syntax element coeff_abs_level_minus_one is sent, which specifies the absolute value of the transform coefficient level minus one. Finally, the binary syntax element coeff_sign_flag, which specifies the sign of the transform coefficient value, is coded for each significant transform coefficient level. It should be noted again that the syntax elements that are related to the significance map are coded in scanning order, whereas the syntax elements that are related to the actual values of the transform coefficients levels are coded in reverse scanning order allowing the usage of more suitable context models.
In the CABAC entropy coding in H.264, all syntax elements for the transform coefficient levels are coded using a binary probability modelling. The non-binary syntax element coeff_abs_level_minus_one is first binarized, i.e., it is mapped onto a sequence of binary decisions (bins), and these bins are sequentially coded. The binary syntax elements significant_coeff_flag, last_significant_coeff_flag, coeff_abs_greater_one, and coeff_sign_flag are directly coded. Each coded bin (including the binary syntax elements) is associated with a context. A context represents a probability model for a class of coded bins. A measure related to the probability for one of the two possible bin values is estimated for each context based on the values of the bins that have been already coded with the corresponding context. For several bins related to the transform coding, the context that is used for coding is selected based on already transmitted syntax elements or based on the position inside a block.
The significance map specifies information about the significance (transform coefficient level is different from zero) for the scan positions. In the CABAC entropy coding of H.264, for a block size of 4×4, a separate context is used for each scan position for coding the binary syntax elements significant_coeff_flag and the last_significant_coeff_flag, where different contexts are used for the significant_coeff_flag and the last_significant_coeff_flag of a scan position. For 8×8 blocks, the same context model is used for four successive scan positions, resulting in 16 context models for the significant_coeff_flag and additional 16 context models for the last_significant_coeff_flag.
This method of context modelling for the significant_coeff_flag and the last_significant_coeff_flag has some disadvantages for large block sizes. On the one hand side, if each scan position is associated with a separate context model, the number of context models does significantly increase when blocks greater than 8×8 are coded. Such an increased number of context models results in a slow adaptation of the probability estimates and usually an inaccuracy of the probability estimates, where both aspects have a negative impact on the coding efficiency. On the other hand, the assignment of a context model to a number of successive scan positions (as done for 8×8 blocks in H.264) is also not optimal for larger block sizes, since the non-zero transform coefficients are usually concentrated in particular regions of a transform block (the regions are dependent on the main structures inside the corresponding blocks of the residual signal).
After coding the significance map, the block is processed in reverse scan order. If a scan position is significant, i.e., the coefficient is different from zero, the binary syntax element coeff_abs_greater_one is transmitted. Initially, the second context model of the corresponding context model set is selected for the coeff_abs_greater_one syntax element. If the coded value of any coeff_abs_greater_one syntax element inside the block is equal to one (i.e., the absolute coefficient is greater than 2), the context modelling switches back to the first context model of the set and uses this context model up to the end of the block. Otherwise (all coded values of coeff_abs_greater_one inside the block are zero and the corresponding absolute coefficient levels are equal to one), the context model is chosen depending on the number of the coeff_abs_greater_one syntax elements equal to zero that have already been coded/decoded in the reverse scan of the considered block. The context model selection for the syntax element coeff_abs_greater_one can be summarized by the following equation, where the current context model index Ct+1 is selected based on the previous context model index Ct and the value of the previously coded syntax element coeff_abs_greater_one, which is represented by bint in the equation. For the first syntax element coeff_abs_greater_one inside a block, the context model index is set equal to Ct=1.
            C              t        +        1              ⁡          (                        C          t                ,                  bin          t                    )        =      {                                        0            ,                                                              for              ⁢                                                          ⁢                              bin                t                                      =            1                                                            min            ⁡                          (                                                                    C                    t                                    +                  1                                ,                4                            )                                                                          for              ⁢                                                          ⁢                              bin                t                                      =            0                              
The second syntax element for coding the absolute transform coefficient levels, coeff_abs_level_minus_one is only coded, when the coeff_abs_greater_one syntax element for the same scan position is equal to one. The non-binary syntax element coeff_abs_level_minus_one is binarized into a sequence of bins and for the first bin of this binarization; a context model index is selected as described in the following. The remaining bins of the binarization are coded with fixed contexts. The context for the first bin of the binarization is selected as follows. For the first coeff_abs_level minus_one syntax element, the first context model of the set of context models for the first bin of the coeff_abs_level_minus_one syntax element is selected, the corresponding context model index is set equal to Ct=0. For each further first bin of the coeff_abs_level_minus_one syntax element, the context modelling switches to the next context model in the set, where the number of context models in set is limited to 5. The context model selection can be expressed by the following formula, where the current context model index Ct+1 is selected based on the previous context model index Ct. As mentioned above, for the first syntax element coeff_abs_level_minus_one inside a block, the context model index is set equal to Ct=0. Note, that different sets of context models are used for the syntax elements coeff_abs_greater_one and coeff_abs_level_minus_one.Ct+1(C)=min(Ct+1,4)
For large blocks, this method has some disadvantages. The selection of the first context model for coeff_abs_greater_one (which is used if a value of coeff_abs_greater_one equal to 1 has been coded for the blocks) is usually done too early and the last context model for coeff_abs_level_minus_one is reached too fast because the number of significant coefficients is larger than in small blocks. So, most bins of coeff_abs_greater_one and coeff_abs_level_minus_one are coded with a single context model. But these bins usually have different probabilities, and hence the usage of a single context model for a large number of bins has a negative impact on the coding efficiency.
Although, in general, large blocks increase the computational overhead for performing the spectral decomposing transform, the ability to effectively code both small and large blocks would enable the achievement of better coding efficiency in coding sample arrays such as pictures or sample arrays representing other spatially sampled information signals such as depth maps or the like. The reason for this is the dependency between spatial and spectral resolution when transforming a sample array within blocks: the larger the blocks the higher the spectral resolution of the transform is. Generally, it would be favorable to be able to locally apply the individual transform on a sample array such that within the area of such an individual transform, the spectral composition of the sample array does not vary to a great extent. To small blocks guarantee that the content within the blocks is relatively consistent. On the other hand, if the blocks are too small, the spectral resolution is low, and the ratio between non-significant and significant transform coefficients gets lower.
Thus, it would be favorable to have a coding scheme which enables an efficient coding for transform coefficient blocks, even when they are large, and their significance maps.