In the video coding scheme based on the method described in Non Patent Literature (NPL) 1, each frame of digitized video is split into coding tree units (CTUs), and each CTU is coded in raster scan order. Each CTU is split into coding units (CUs) and coded, in a quadtree structure. Each CU is split into prediction units (PUs) and predicted. The prediction error of each CU is split into transform units (TUs) in a quadtree structure and frequency-transformed.
A CU is a unit of coding in intra prediction/inter-frame prediction. Intra prediction and inter-frame prediction will be described below.
Intra prediction is prediction from a reconstructed image of a frame to be coded. NPL 1 defines, for example, 33 types of angular intra prediction depicted in FIG. 9. In angular intra prediction, a reconstructed pixel near a block to be coded is used for extrapolation in any of 33 directions depicted in FIG. 9, to generate an intra prediction signal. A CU using intra prediction is hereafter referred to as intra CU.
Inter-frame prediction is prediction based on an image of a reconstructed frame (reference picture) different in display time from a frame to be coded. Inter-frame prediction is hereafter also referred to as inter prediction. FIG. 10 is an explanatory diagram showing an example of inter-frame prediction. A motion vector MV=(mvx, mvy) indicates the amount of translation of a reconstructed image block of a reference picture relative to a block to be coded. In inter prediction, an inter prediction signal is generated based on a reconstructed image block of a reference picture (using pixel interpolation if necessary). A CU using inter prediction is hereafter referred to as inter CU.
A frame coded including only intra CUs is called I frame (or I picture). A frame coded including not only intra CUs but also inter CUs is called P frame (or P picture). A frame coded including inter CUs that each use not only one reference picture but two reference pictures simultaneously for the inter prediction of the block is called B frame (or B picture).
Next, Referring next to FIG. 11, the configuration and operation of a typical video coding device that receives each CU of each frame of digitized video as an input image and outputs a bitstream will be described.
A video coding device depicted in FIG. 11 includes a transformer 101, a quantizer 1020, an entropy encoder 103, an inverse transformer/inverse quantizer 104, a buffer 105, a predictor 106, and an estimator 107.
FIG. 12 is an explanatory diagram showing an example of CTU partitioning of the t-th frame and an example of CU partitioning of the eighth CTU (CTU8), in the case where the spatial resolution of the frame is the CIF (common intermediate format) and the CTU size is 64. FIG. 13 is an explanatory diagram showing a quadtree structure corresponding to the example of CU partitioning of CTU8.
FIG. 14 is an explanatory diagram showing examples of TU partitioning of a CU. An example of TU partitioning of a CU for an intra prediction 2N×2N PU is depicted in the upper part. In the case where the CU is intra predicted, the root of the quadtree is located in the PU, and the prediction error is expressed by the quadtree structure. An example of TU partitioning of a CU for inter prediction 2N×N PUs is depicted in the lower part. In the case where the CU is inter predicted, the root of the quadtree is located in the CU, and the prediction error is expressed by the quadtree structure.
The estimator 107 determines a CU quadtree structure, a PU partitioning shape, and a TU quadtree structure for each CTU.
The predictor 106 generates a prediction signal corresponding to the input image signal of the CU based on the CU quadtree structure and PU partitioning shape determined by the estimator 107. The prediction signal is generated based on the above-mentioned intra prediction or inter prediction.
The transformer 101 frequency-transforms a prediction error image obtained by subtracting the prediction signal from the input image signal based on the TU quadtree structure determined by the estimator 107.
The quantizer 1020 quantizes the frequency-transformed prediction error image (orthogonal transform coefficient). The quantized orthogonal transform coefficient is hereafter referred to as a coefficient level. A coefficient level having a value other than 0 is referred to as a significant coefficient level. As depicted in FIG. 15, the quantizer 1020 includes a coefficient level calculation unit 1201 that receives an orthogonal transform coefficient Kij and a quantization parameter QP and outputs a coefficient level Lij.
The entropy encoder 103 entropy-encodes cu_split_flag indicating the quadtree structure of the CTU, the prediction parameter, and the coefficient level.
The inverse transformer/inverse quantizer 104 inverse-quantizes the coefficient level. The inverse transformer/inverse quantizer 104 further inversely frequency-transforms the orthogonal transform coefficient obtained by the inverse quantization. The prediction signal is added to the reconstructed prediction error image obtained by the inverse frequency transform, and the result is supplied to the buffer 105. The buffer 105 stores the reconstructed image.
The typical video coding device generates a bitstream based on the operation described above.
FIG. 16 is an explanatory diagram showing an example of quantizing Kij using Qs having the value 4096 and a parameter f having the value ⅓. The operation of the quantizer 1020 and the entropy encoder 103 will be described below in further detail by using an example of 4×4 TU depicted in FIG. 16.
First, each orthogonal transform coefficient Kij and coefficient level Lij in the 4×4 TU are defined as follows.
Let Kij (0≤i, j≤3) be the value of orthogonal transform coefficient in a horizontal position i and a vertical position j on a frequency axis. Likewise, let the coefficient level Lij be the value of coefficient level corresponding to the orthogonal transform coefficient Kij. Note that Kij and Lij become higher frequency components as the values of i and j are larger.
Quantization is described in detail next. The coefficient level calculation unit 1201 calculates the coefficient level Lij by dividing Kij by the quantization step Qs. The coefficient level Lij is represented by the following Equation (1).Lij=Sign(Kij)·Floor(|Kij|/Qs+f)  (1)
Here, Sign(a) is a function that returns the positive or negative sign of an input a, Floor(a) is a function that returns the largest integer less than or equal to the input a, and f is a parameter (0≤f≤0.5) for determining quantization characteristics. The value of f is ⅙ in inter prediction, and ⅓ in intra prediction.
Qs is represented by the following Equation (2) using the quantization parameter QP.
                    Qs        =                  2                      7            +                          QP              6                        -                                          log                2                            ⁡                              (                N                )                                      -                          2              3                                                          (        2        )            
Here, N is the block size of the TU. Regarding the 4×4 TU depicted in FIG. 16, N=4. FIG. 16 depicts an example of quantizing Kij using Qs having the value 4096 and f having the value ⅓.
Then, entropy encoding will be described in detail. First, position information and value information used in the description of entropy encoding for coefficient levels are defined.
In this specification, the position information is information indicating the positions of all significant coefficient levels included in the TU.
Referring to 7.4.9.11 in NPL 1, position information in HEVC (High Efficiency Video Coding) is composed of information last_significant_x and last_significant_y indicating the horizontal position and the vertical position of a significant coefficient level to be first transmitted, and information siginificant_coeff_flag indicative of the presence or absence of a significant coefficient level in each of positions from a position subsequent to (last_significant_x, last_significant_y) up to (0, 0). Therefore, the number of position information bits is the sum of the number of last_significant_x bits, the number of last_significant_y bits, and the number of siginificant_coeff_flag bits determined based on the position of the significant coefficient level to be first transmitted.
In this specification, the value information is information indicative of the value of a significant coefficient level.
Referring to 7.4.9.11 in NPL 1, the value information in HEVC is composed of information coeff_abs_level_greater1_flag indicating whether the absolute value of a significant coefficient level is larger than 1, information coeff_abs_level_greater2_flag indicating whether the absolute value of the significant coefficient level is larger than 2, information coeff_sign_flag indicative of the positive or negative sign of the significant coefficient level, and information coeff_abs_level_remaining indicative of the absolute value of a value (remaining significant coefficient level) obtained by subtracting coeff_abs_level_greater1_flag and coeff_abs_level_greater2_flag from the absolute value of the significant coefficient level larger than coeff_abs_level_greater1_flag and coeff_abs_level_greater2_flag. Therefore, the number of value information bits is the sum of the number of coeff_abs_level_greater1_flag bits, the number of coeff_abs_level_greater2_flag bits, the number of coeff_sign_flag bits, and the number of coeff_abs_level_remaining bits of the significant coefficient level.
Table 1 shows the relationship between the position information and value information and the coefficient level Lij. In Table 1, the items in the vertical items with respect to the horizontal ones are information of each Lij in the 4×4 TU depicted in FIG. 16.
TABLE 1COEFFICIENT LEVEL LijSYNTAXL33 = 0L32 = 0L23 = 0L31 = 0L22 = 0L13 = 0L30 = 1L21 = 0POSITIONlast_siginificant_x——————3—INFORMATIONlast_siginificant_y——————0—siginificant_coeff_flag———————0VALUEcoeff_abs_level_greater1_flag——————0—INFORMATIONcoeff_abs_level_greater2_flag————————coeff_sign_flag——————0—coeff_abs_level_remaining————————COEFFICIENT LEVEL LijSYNTAXL12 = 0L03 = 0L20 = 0L11 = 0L02 = 0L10 = 0L01 = 1L00 = 0POSITIONlast_siginificant_x————————INFORMATIONlast_siginificant_y————————siginificant_coeff_flag00000010VALUEcoeff_abs_level_greater1_flag——————0—INFORMATIONcoeff_abs_level_greater2_flag————————coeff_sign_flag——————0—coeff_abs_level_remaining————————
In Table 1, the last_significant_x and the last_significant_y indicate a position (i, j)=(3, 0) of a significant coefficient level L30=1 to be first transmitted. The siginificant_coeff_flag indicates the presence or absence of a significant coefficient level in each of positions from a position subsequent to (3, 0) up to (0, 0). In the case of a significant coefficient level, siginificant_coeff_flag=1, while in the case of an insignificant coefficient level, siginificant_coeff_flag=0. In Table 1, L30=1 and L01=1 are represented by coeff_abs_level_greater1_flag=0 and coeff_sign_flag=0 (positive), respectively. Since both values are smaller than 2, coeff_abs_level_greater2_flag and coeff_abs_level_remaining are not used.
Table 2 indicates the relationship between the position information and value information and the number of bits of value information. In Table 2, the vertical items relative to the horizontal ones indicate the number of position information bits and the number of value information bits in the 4×4 TU depicted in FIG. 16.
TABLE 2Bin stringSYNTAXL33 = 0L32 = 0L23 = 0L31 = 0L22 = 0L13 = 0L30 = 1L21 = 0L12 = 0POSITIONlast_siginificant_x——————111——INFORMATIONlast_siginificant_y——————0——siginificant_coeff_flag———————00VALUEcoeff_abs_level_greater1_flag——————0——INFORMATIONcoeff_abs_level_greater2_flag—————————coeff_sign_flag——————0——coeff_abs_level_remaining—————————Bin string[binSYNTAXL03 = 0L20 = 0L11 = 0L02 = 0L10 = 0L01 = 1L00 = 0number]POSITIONlast_siginificant_x———————313INFORMATIONlast_siginificant_y———————1siginificant_coeff_flag00000109VALUEcoeff_abs_level_greater1_flag—————0—24INFORMATIONcoeff_abs_level_greater2_flag———————0coeff_sign_flag—————0—2coeff_abs_level_remaining———————0total17
The number of bits of each information in Table 2 is represented by a bin number. The term bin denotes one bit in an intermediate bit string before being transformed into a bitstream to be output by the entropy encoder 103.
In the case of the 4×4 TU depicted in FIG. 16, after transmitting 13 bin as bits of the position information on all significant coefficient levels in the TU, the entropy encoder 103 transmits the number of value information bits of respective significant coefficient levels, i.e., a total of 4 bin. The position information is composed of last_significant_x, last_significant_y, and siginificant_coeff_flag. The last_significant_x and the last_significant_y indicate (i, j)=(3, 0) as the position of a significant coefficient to be first transmitted, which is 4 bin. The siginificant_coeff_flag indicates the presence or absence of a significant coefficient level in each of nine positions from a position (2, 1) subsequent to the position of the significant coefficient to be first transmitted up to (0, 0), which is 9 bin.
The value information is composed of coeff_abs_level_greater1_flag, coeff_coeff_abs_level_greater2_flag, coeff_sign_flag, and coeff_abs_level_remaining. The coeff_abs_level_greater1_flag indicates whether L30 and L01 are larger than 1 respectively, which is 2 bin. The coeff_coeff_abs_level_greater2_flag is 0 bin because there exists no coefficient level with the absolute value of the significant coefficient level larger than 2. The coeff_sign_flag indicates the positive or negative signs of L30 and L01, which is 2 bin. The coeff_abs_level_remaining is 0 bin because there exists no coefficient level with the absolute value of the significant coefficient level larger than 2.
In the case of the 4×4 TU, the maximum bin number of each information is as follows: In other words, since the maximum value in the 4×4 TU is 3 (=N−1=4−1) based on the section 7.4.9.11 in NPL 1, the last_significant_x is up to 3 bin. Similarly, the last_significant_y is also up to 3 bin. The siginificant_coeff_flag is up to 15 bin because of up to 15 per 4×4 TU based on the section 7.3.8.11 in NPL 1. The coeff_abs_level_greater1_flag is up to 8 bin because of up to 8 per 4×4 TU based on the section 7.3.9.11 in NPL 1. The coeff_coeff_abs_level_greater2_flag is up to 1 bin because of up to 1 per 4×4 TU based on the section 7.3.8.11 in NPL 1.
The coeff_sign_flag is up to 16 bin because of up to 16 per 4×4 TU based on the section 7.3.8.11 in NPL 1. Based on the section 9.3.3.9 in NPL 1, bin for coeff_abs_level_remaining is calculated.
As apparent from Equations (9-13) and (9-14) in the section 9.2.2.8 of NPL 1, high-dimensional TU code is applied to a bin string of the prefix part of coeff_abs_level_remaining each time the value of a significant coefficient level last transmitted exceeds a predetermined threshold value, and high-dimensional Exp-Golom code is applied to a bin string of the suffix part of coeff_abs_level_remaining. In other words, the bin number for the suffix part of coeff_abs_level_remaining having a small value becomes large, while the bin number for the suffix part of coeff_abs_level_remaining having a large value becomes small.