As a video coding system aimed at transmitting and accumulating video information with high efficiency, there is a coding system in the ISO/IEC 14496-10 Advanced Video Coding (AVC) standard (hereinafter called H.264/AVC standard) described in Non-Patent Literature (NPL) 1. In the video coding system described in NPL 1, a frame is divided into blocks of 16×16 pixel size called macroblocks (MB), and each MB is encoded sequentially from top left of the frame. In the AH.264/AVC standard described in NPL 1, the MB is further divided into blocks of 4×4 pixel size, and each 4×4 block is encoded.
Intra prediction is prediction for generating a prediction image from a reconstructed image of a frame to be encoded. An intra prediction signal is a prediction signal generated based on an image of a reconstructed picture (typically stored in a buffer) that has the same display time as a current picture. In regard to intra prediction, as described in NPL 1, intra prediction modes of three block sizes, Intra_4×4, Intra_8×8, and Intra_16×16, are available.
Further, in a video coding system (H.265/HEVC system) based on NPL 2, each frame of digitized video is divided into coding tree units (CTUs), and each CTU is encoded in order of raster scanning. Each CTU is split into coding units (CUs) in a quadtree structure. Each CU is split into prediction units (PUs) and predicted. Further, a prediction error of each CU is divided into transform units (TUs) in the quadtree structure, and transformed. Hereafter, a CU of the largest size is called the maximum CU (LCU: Largest Coding Unit), and a CU of the smallest size is called the minimum CU (SCU: Smallest Coding Unit). Note that the LCU size and the CTU size are the same.
The CU is prediction-encoded by intra prediction or inter-frame prediction. The intra prediction and inter-frame prediction in the H.265/HEVC system will be described below.
In NPL 2, a total of 33 angular intra prediction (directional intra prediction) modes shown in FIG. 11 are defined. In FIG. 11, the arrows indicate prediction directions. The angular intra prediction is to extrapolate a reconstructed pixel around a block to be encoded into any of the 33 directions shown in FIG. 11 to generate an intra prediction signal. Note that the numerals indicate prediction mode numbers in FIG. 11. In addition to the total 33 angular intra prediction modes, DC intra prediction for averaging peripheral reconstructed pixels of a block to be encoded, and planar intra prediction for linearly interpolating the peripheral reconstructed pixels of the block to be encoded are defined in NPL 2. Hereinafter, a CU encoded based on intra prediction is called an intra CU.
The inter-frame prediction is prediction based on an image of a reconstructed frame (reference picture) that has a display time different from that of the frame to be encoded. Hereinafter, the inter-frame prediction may also called inter prediction. FIG. 12 is an explanatory diagram showing an example of inter-frame prediction. A motion vector MV=(mvx, mvy) represents the extent of translation of a reconstructed image block of the reference picture with respect to a block to be encoded. The inter prediction generates an inter prediction signal based on the reconstructed image block of the reference picture (using pixel interpolation if necessary). Hereafter, a CU encoded based on the inter-frame prediction is called an inter CU.
Whether each CU is either an intra CU or an inter CU is signaled by the pred_mode_flag syntax described in NPL 2.
A frame encoded with only intra CUs is called an I frame (or an I picture). A frame encoded including inter CUs as well as intra CUs is called a P frame (or a P picture). A frame encoded including inter CUs for which not only one reference picture but two reference pictures are simultaneously used for inter prediction of a block is called a B frame (or a B picture).
Referring to FIG. 13, the configuration and operation of a typical video encoding device for outputting a bitstream using each CU of each frame of digitized video as an input image will be described.
The video encoding device shown in FIG. 13 includes a transformer/quantizer 1021, an inverse quantizer/inverse transformer 1022, a buffer 1023, a predictor 1024, an estimator 1025, and an entropy encoder 1056.
FIG. 14 is an explanatory diagram showing an example of CTU division of a frame t in the case where the frame has a spatial resolution of CIF (Common Intermediate Format) and the CTU size is 64, and an example of CU division of the eighth CTU (CTU 8) included in the frame t. FIG. 15 is an explanatory diagram showing a quadtree structure corresponding to the example of CU division of the CTU 8. The quadtree structure of each CTU, i.e., the CU partitioning shape is signaled by the split_cu_flag syntax described in NPL 2.
FIG. 16 is an explanatory diagram showing a PU partitioning shape of a CU. In the case of an intra CU, square PU division can be selected. In the case of an inter CU, rectangular PU division as well as the square can be selected. The PU partitioning shape of each CU is signaled by the part_mode syntax described in NPL 2.
FIG. 17 is an explanatory diagram showing an example of TU division of a CU. In the upper part, an example of TU division of an intra CU having a 2N×2N PU partitioning shape is shown. In the case of the intra CU, the root of the quadtree is assigned to a PU, and a prediction error for each PU is represented by a quadtree structure. In the lower part, an example of TU division of an inter CU having a 2N×N PU partitioning shape is shown. In the case of the inter CU, the root of the quadtree is assigned to the CU, and a prediction error for the CU is represented by a quadtree structure. The quadtree structure of the above prediction error, i.e., the TU partitioning shape of each CU is signaled by the split_tu_flag syntax described in NPL 2.
The estimator 1025 determines a split_cu_flag syntax value for determining a CU partitioning shape to minimize the coding cost for each CTU. The estimator 1025 determines a pred_mode_flag syntax value for determining intra prediction/inter prediction, a part_mode syntax value for determining a PU partitioning shape, and a split_tu_flag syntax value for determining a TU partitioning shape to minimize the coding cost for each CU. The estimator 1025 determines an intra prediction direction, a motion vector, and the like to minimize the coding cost for each PU.
NPL 3 discloses a decision method for the split_cu_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, the intra prediction direction, the motion vector, and the like to minimize a coding cost J based on the Lagrange multiplier λ.
Referring to 4.8.3 Intra/Inter/PCM mode decision in NPL 3, a decision process for the split_cu_flag syntax value, the pred_mode_flag syntax value, and the part_mode syntax value will be described in brief below.
In this section, a CU mode decision process for determining the pred_mode_flag syntax value and the part_mode syntax value for a CU is disclosed. A process for a CU partitioning shape for determining the split_cu_flag syntax value by recursively executing the CU mode decision process is also disclosed.
First, the CU mode decision process will be described. A set of PU partitioning shape candidates for inter prediction is denoted as InterCandidate, a set of PU partitioning shape candidates for intra prediction is denoted as IntraCandidate, and SSE (Sum of Square Error) coding cost JSSE(mode) for a certain encoding mode (mode) is defined as follows:InterCandidate={INTER_2N×2N,INTER_2N×N,INTER_N×2N,INTER_2N×N,INTER_N×2N,INTER_2N×nU,INTER_2N×nD,INTER_nL×2N,INTER_nR×2N,INTER_N×N}IntraCandidate={INTRA_2N×2N,INTRA_N×N}JSSE(mode)=DSSE(mode)+λmode·Rmode(mode)
                    [                  Math          .                                          ⁢          1                ]                                                                      λ          mode                =                  2                                    QP              -              12                        3                                                          
Note that DSSE(mode), Rmode(mode), and QP denote SSE with a reconstructed image signal obtained by encoding using a CU input image signal and mode, the number of bits of a CU (including the number of bits of a quantized transform value to be described later) generated upon encoding using mode, and a quantization parameter, respectively.
In the CU mode decision process, bestPUmode as a combination of pred_mode_flag syntax and part_mode syntax to minimize SSE coding cost JSSE(mode) is selected from InterCandidate and IntraCandidate. As a formula, the CU mode decision process can be defined as follows:
                    [                  Math          .                                          ⁢          2                ]                                                            bestPUmode        =                  arg          ⁢                                          ⁢                                    min                              PUmode                ∈                PUCandidate                                      ⁢                          {                                                J                  SSE                                ⁡                                  (                  PUmode                  )                                            }                                                                      PUCandidate={InterCandidate,IntraCandidate}
Next, the decision process for a CU partitioning shape will be described.
As shown in FIG. 15, the SSE coding cost of a CU (hereinafter called a node) having a certain CUDepth is the SSE coding cost of the bestPUmode of the node. In other words, the SSE coding cost JSSE(node, CUDepth) of a node can be defined as follows:
                    [                  Math          .                                          ⁢          3                ]                                                                                  J            SSE                    ⁡                      (                          node              ,              CUDepth                        )                          =                              min                          PUmode              ∈              PUCandidate                                ⁢                      {                                          J                SSE                            ⁡                              (                PUmode                )                                      }                                                          
The SSE coding cost of the i-th child CU (hereinafter called a child node or a leaf) of the CU having CUDepth, where 1≤i≤4, is the SSE coding cost of the i-th CU having CUDepth+1. In other words, the SSE coding cost JSSE(leaf(i), CUDepth) of the i-th leaf can be defined as follows:JSSE(leaf(i),CUDepth)=JSSE(node,CUDepth+1)
It can be compared whether the SSE coding cost of a node is higher than the sum of the SSE coding costs of leaves thereof to determine whether to divide the CU into four child CUs. When JSSE(node, CUDepth) is larger than the value of the following expression (1), the CU is divided into four child CUs (split_cu_flag=1 is determined). When JSSE(node, CUDepth) is smaller than or equal to the value of the expression (1), the CU is not divided into four child CUs (split_cu_flag=0 is determined).
                    [                  Math          .                                          ⁢          4                ]                                                                      ∑                      i            =            1                    4                ⁢                              J            SSE                    ⁡                      (                                          leaf                ⁢                                                                  ⁢                                  (                  i                  )                                            ,              CUDepth                        )                                              (        1        )            
A decision process for a CU quadtree structure recursively makes the above-mentioned comparison for each CUDepth to determine the quadtree structure of a CTU. In other words, split_cu_flag of a leaf is determined for each CUDepth.
The estimator 1025 minimizes the coding cost J based on the Lagrange multiplier λ to determine split_tu_flag, the intra prediction direction, the motion vector, and the like in the same manner.
Based on the split_cu_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, the intra prediction direction, the motion vector, and the like determined by the estimator 1025, the predictor 1024 generates a prediction signal for an input image signal of each CU. The prediction signal is generated based on intra prediction or inter-frame prediction mentioned above.
The transformer/quantizer 1021 frequency-transforms a prediction error image obtained by subtracting the prediction signal from the input image signal, based on the TU partitioning shape determined by the estimator 1025.
The transformer/quantizer 1021 further quantizes the frequency-transformed prediction error image (frequency transform coefficient). The quantized frequency transform coefficient is hereafter referred to as “transform quantization value”.
The entropy encoder 1056 entropy-encodes difference information on the split_cu_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, and the intra prediction direction determined by the estimator 1025, difference information on the motion vector, and the quantized transform value.
The inverse quantizer/inverse transformer 1022 inverse-quantizes the transform quantization value. The inverse quantizer/inverse transformer 1022 further inverse-frequency-transforms the frequency transform coefficient obtained by the inverse quantization. The prediction signal is added to the reconstructed prediction error image obtained by the inverse frequency transform, and the result is supplied to the buffer 1023. The buffer 1023 stores the reconstructed image.
The typical video encoding device generates a bitstream based on the operation described above.
In the video encoding device depicted in FIG. 13, the load of all of the video encoding process for determining the split_cu_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, the intra prediction direction, the motion vector, etc. is concentrated at the specific estimator.