A video coding system based on a method described in Non-Patent Literature (NPL) 1 divides each frame of digitized video into coding tree units (CTUs), and each CTU is encoded in order of raster scanning. Each CTU is split into coding units (CUs) in a quadtree structure and encoded. Each CU is split into prediction units (PUs) and predicted. Further, a prediction error of each CU is split into transform units (TUs) in a quadtree structure and frequency-transformed.
The CU is a coding unit of intra prediction/inter-frame prediction. Intra prediction and inter-frame prediction will be described below.
Intra prediction is prediction from a reconstructed image of a frame to be encoded. NPL 1 defines 33 types of angular intra prediction depicted in FIG. 14 and the like. In angular intra prediction, a reconstructed pixel around a block to be encoded is extrapolated in any of 33 directions depicted in FIG. 14 to generate an intra prediction signal. Hereinafter, a CU using intra prediction is referred to as an intra CU.
Inter-frame prediction is prediction based on an image of a reconstructed frame (reference picture) different in display time from a frame to be encoded. Hereinafter, inter-frame prediction is also referred to as inter prediction. FIG. 15 is an explanatory diagram depicting an example of inter-frame prediction. A motion vector MV=(mvx, mvy) indicates the amount of translation of a reconstructed image block of a reference picture relative to a block to be encoded. In inter prediction, an inter prediction signal is generated based on a reconstructed image block of a reference picture (using pixel interpolation if necessary). Hereinafter, a CU using inter prediction is referred to as an inter CU.
A frame encoded with only intra CUs is called an I frame (or an I picture). A frame encoded including inter CUs as well as intra CUs is called a P frame (or a P picture). A frame encoded including inter CUs for which not only one reference picture but two reference pictures are simultaneously used for inter prediction of a block is called a B frame (or a B picture).
Referring next to FIG. 16, the configuration and operation of a typical video coding device that receives each CU of each frame of digitized video as an input image and outputs a bitstream will be described.
The video coding device depicted in FIG. 16 includes a transformer 101, a quantizer 1020, an entropy encoder 103, an inverse transformer/inverse quantizer 104, a buffer 105, a predictor 106, and an estimator 107.
FIG. 17 is an explanatory diagram depicting an example of CTU partitioning of a frame t and an example of CU partitioning of the eighth CTU (CTU8) in the frame t when the spatial resolution of the frame is CIF (Common Intermediate Format) and the CTU size is 64. FIG. 18 is an explanatory diagram depicting a quadtree structure corresponding to the example of CU partitioning of CTU8.
FIG. 19 is an explanatory diagram depicting an example of TU partitioning of a CU. An example of TU partitioning of a CU for an intra prediction 2N×2N PU is depicted in the upper part. When the CU is an intra prediction CU, the root of the quadtree is placed in the PU, and a prediction error is expressed by the quadtree structure. An example of TU partitioning of a CU for inter prediction 2N×N PUs is depicted in the lower part. When the CU is an inter prediction CU, the root of the quadtree is placed in the CU, and a prediction error is expressed by the quadtree structure.
The estimator 107 determines, for each CTU, a CU quadtree structure, a PU partitioning shape, and a TU quadtree structure that minimize the entropy coding cost.
The predictor 106 generates a prediction signal for an input image signal of a CU based on the CU quadtree structure and the PU partitioning shape determined by the estimator 107. The prediction signal is generated based on the intra prediction or inter prediction mentioned above.
Based on the TU quadtree structure determined by the estimator 107, the transformer 101 frequency-transforms a prediction error image obtained by subtracting the prediction signal from the input image signal.
The quantizer 1020 quantizes the frequency-transformed prediction error image (orthogonal transform coefficient). Hereinafter, the quantized orthogonal transform coefficient is called a coefficient level. Further, a coefficient level having a non-zero value is called a significant coefficient level. As depicted in FIG. 20, the quantizer 1020 includes a coefficient level calculation unit 1201 which takes input of an orthogonal transform coefficient Kij and a quantization parameter QP, and outputs a coefficient level Lij.
The entropy encoder 103 entropy-encodes cu split flag indicative of the quadtree structure of a CTU, a prediction parameter, and a coefficient level.
The inverse transformer/inverse quantizer 104 inversely quantizes the coefficient level. The inverse transformer/inverse quantizer 104 further inversely frequency-transforms the inversely quantized orthogonal transform coefficient. The prediction signal is added to a reconstructed prediction error image obtained by the inverse transform, and supplied to the buffer 105. The buffer 105 stores the reconstructed image.
Based on the operation mentioned above, the typical video coding device generates a bitstream.
The operation of the quantizer 1020 and the entropy encoder 103 will be described below in further detail by using an example of 4×4 TU depicted in FIG. 21, respectively.
First, the orthogonal transform coefficient Kij and the coefficient level Lij of the 4×4 TU are defined as follows:
Kij (0≤i, j≤3) is defined as an value of the orthogonal transform coefficient in a horizontal position i and a vertical position j on a frequency axis. Similarly, the coefficient level Lij is defined as a value of the coefficient level corresponding to the orthogonal transform coefficient Kij. Note that Kij and Lij become higher frequency components as the values of i, j increase.
Next, quantization will be described in detail. The coefficient level calculation unit 1201 divides Kij by a quantization step Qs to calculate the coefficient level Lij. As a formula, the coefficient level Lij is represented by Equation (1).Lij=Sign(Kij)·Floor(|Kij|/Qs+f)  (1)
Note that Sign (a) is a function that returns the positive or negative sign of input a, Floor (a) is a function that returns the largest integer less than or equal to the input a, and f is a parameter (0≤f≤0.5) for determining quantization characteristics. The value of f is set to ⅙ in inter prediction and to ⅓ in intra prediction.
Qs is represented by Equation (2) below using a quantization parameter QP.
                    Qs        =                  2                      7            +                                          QP                6                            ⁢                                                log                  2                                ⁡                                  (                  N                  )                                                      -                          2              3                                                          (        2        )            
Note that N denotes the block size of a TU. In the 4×4 TU depicted in FIG. 21, N=4. In FIG. 21, an example of quantizing Kij using Qs having a value of 4096 and f having a value of ⅓ is depicted.
Then, entropy coding will be described in detail. First, location information and value information used in describing entropy coding for a coefficient level will be defined.
In this specification, the location information is information indicating the locations of all significant coefficient levels of the TU.
Referring to the section 7.4.9.11 in NPL 1, the location information in HEVC is composed of information last_significant_x and last_significant_y indicating the horizontal position and the vertical position of a significant coefficient level to be first transmitted, and information significant_coeff_flag indicative of the presence or absence of a significant coefficient level in each of locations from a location subsequent to (last_significant_x, last_significant_y) up to (0, 0). Therefore, the number of location information bits is the sum of the number of last_significant_x bits, the number of last_significant_y bits, and the number of significant_coeff_flag bits determined based on the location of the significant coefficient level to be first transmitted.
In this specification, the value information is information indicative of the value of a significant coefficient level.
Referring to the section 7.4.9.11 in NPL 1, the value information in HEVC is composed of information coeff_abs_level_greater1_flag indicating whether the absolute value of a significant coefficient level is larger than 1, information coeff_abs_level_greater2_flag indicating whether the absolute value of the significant coefficient level is larger than 2, information coeff_sign_flag indicative of the positive or negative sign of the significant coefficient level, and information coeff_abs_level_remaining indicative of the absolute value of a value (remaining significant coefficient level) obtained by subtracting coeff_abs_level_greater1_flag and coeff_abs_level_greater2_flag from the absolute value of the significant coefficient level larger than coeff_abs_level_greater1_flag and coeff_abs_level_greater2_flag. Therefore, the number of value information bits is the sum of the number of coeff_abs_level_greater1_flag bits, the number of coeff_abs_level_greater2_flag bits, the number of coeff_sign_flag bits, and the number of coeff_abs_level_remaining bits of the significant coefficient level.
In FIG. 22, a relationship between the location information and the value information, and the coefficient level Lij is depicted. In FIG. 22, the vertical items relative to the horizontal ones indicate information on each Lij in the 4×4 TU depicted in FIG. 21.
The last_significant_x and the last_significant_y in FIG. 22 indicate a location (i, j)=(3, 0) of a significant coefficient level L30=1 to be first transmitted. The significant_coeff_flag indicates the presence or absence of a significant coefficient level in each of locations from a location subsequent to (3, 0) up to (0, 0). In the case of a significant coefficient level, significant_coeff_flag=1, while in the case of an insignificant coefficient level, significant_coeff_flag=0. In FIG. 22, L30=1 and L01=1 are represented by coeff_abs_level_greater1_flag=0 and coeff_sign_flag=0 (positive), respectively. Since both values are smaller than 2, coeff_abs_level_greater2_flag and coeff_abs_level_remaining are not used.
In FIG. 23, a relationship between the location information and the value information, and the number of value information bits is depicted. In FIG. 23, the vertical items relative to the horizontal ones indicate the number of location information bits and the number of value information bits in the 4×4 TU depicted in FIG. 21.
The number of bits of each information in FIG. 23 is represented by a bin number. The term bin denotes one bit in an intermediate bit string before being transformed into a bitstream to be output by the entropy encoder 103.
In the case of the 4×4 TU depicted in FIG. 21, after transmitting 13bin as bits of the location information on all significant coefficient levels in the TU, the entropy encoder 103 transmits the number of value information bits of respective significant coefficient levels, i.e., a total of 4bin. The location information is composed of last_significant_x, last_significant_y, and significant_coeff_flag. The last_significant_x and the last_significant_y indicate (i, j)=(3, 0) as the location of a significant coefficient to be first transmitted, which is 4bin. The significant_coeff_flag indicates the presence or absence of a significant coefficient level in each of nine locations from a location (2, 1) subsequent to the location of the significant coefficient to be first transmitted up to (0, 0), which is 9bin.
The value information is composed of coeff_abs_level_greater1_flag, coeff_coeff_abs_level_greater2_flag, coeff_sign_flag, and coeff_abs_level_remaining. The coeff_abs_level_greater1_flag indicates whether L30 and L01 are larger than 1 respectively, which is 2bin. The coeff_coeff_abs_level_greater2_flag is 0bin because there exists no coefficient level with the absolute value of the significant coefficient level larger than 2. The coeff_sign_flag indicates the positive or negative signs of L30 and L01, which is 2bin. The coeff_abs_level_remaining is 0bin because there exists no coefficient level with the absolute value of the significant coefficient level larger than 2.
In the case of the 4×4 TU, the maximum bin number of each information is as follows: In other words, since the maximum value in the 4×4 TU is 3 (=N−1=4−1) based on the section 7.4.9.11 in NPL 1, the last_significant_x is up to 3bin. Similarly, the last_significant_y is also up to 3bin. The significant_coeff_flag is up to 15bin because of up to 15 per 4×4 TU based on the section 7.3.9.11 in NPL 1. The coeff_abs_level_greater1_flag is up to 8bin because of up to 8 per 4×4 TU based on the section 7.3.9.11 in NPL 1. The coeff_coeff_abs_level_greater2_flag is up to 1bin because of up to 1 per 4×4 TU based on the section 7.3.9.11 in NPL 1. The coeff_sign_flag is up to 16bin because of up to 16 per 4×4 TU based on the section 7.3.9.11 in NPL 1. Based on the section 9.2.2.8 in NPL 1, bin for coeff_abs_level_remaining is calculated.
As apparent from Equations (9-6) and (9-7) in the section 9.2.2.8 of NPL 1, high-dimensional TU code is applied to a bin string of the prefix part of coeff_abs_level_remaining each time the value of a significant coefficient level last transmitted exceeds a predetermined threshold value, and high-dimensional Exp-Golom code is applied to a bin string of the suffix part of coeff_abs_level_remaining. In other words, the bin number for the suffix part of coeff_abs_level_remaining having a small value becomes large, while the bin number for the suffix part of coeff_abs_level_remaining having a large value becomes small.