In the video coding scheme based on Non Patent Literature (NPL) 1, each frame of digitized video is split into coding tree units (CTUs), and each CTU is encoded in raster scan order. Each CTU is split into coding units (CUs) and encoded, in a quadtree structure. Each CU is split into prediction units (PUs) and predicted. The prediction error of each CU is split into transform units (TUs) and frequency-transformed, in a quadtree structure. Hereafter, a CU of the largest size is referred to as “largest CU” (largest coding unit: LCU), and a CU of the smallest size is referred to as “smallest CU” (smallest coding unit: SCU). The LCU size and the CTU size are the same.
Each CU is prediction-encoded by intra prediction or inter-frame prediction. The following describes intra prediction and inter-frame prediction.
Intra prediction is prediction for generating a prediction image from a reconstructed image of a frame to be encoded. NPL 1 defines 33 types of angular intra prediction depicted in FIG. 15. In angular intra prediction, a reconstructed pixel near a block to be encoded is used for extrapolation in any of 33 directions depicted in FIG. 15, to generate an intra prediction signal. In addition to 33 types of angular intra prediction, NPL 1 defines DC intra prediction for averaging reconstructed pixels near the block to be encoded, and planar intra prediction for linear interpolating reconstructed pixels near the block to be encoded. A CU encoded based on intra prediction is hereafter referred to as “intra CU”.
Inter-frame prediction is prediction based on an image of a reconstructed frame (reference picture) different in display time from a frame to be encoded. Inter-frame prediction is hereafter also referred to as “inter prediction”. FIG. 16 is an explanatory diagram depicting an example of inter-frame prediction. A motion vector MV=(mvx, mvy) indicates the amount of translation of a reconstructed image block of a reference picture relative to a block to be encoded. In inter prediction, an inter prediction signal is generated based on a reconstructed image block of a reference picture (using pixel interpolation if necessary). A CU encoded based on inter-frame prediction is hereafter referred to as “inter CU”.
Whether a CU is an intra CU or an inter CU is signaled by pred_mode_flag syntax described in NPL 1.
A frame encoded including only intra CUs is called “I frame” (or “I picture”). A frame encoded including not only intra CUs but also inter CUs is called “P frame” (or “P picture”). A frame encoded including inter CUs that each use not only one reference picture but two reference pictures simultaneously for the inter prediction of the block is called “B frame” (or “B picture”).
The following describes the structure and operation of a typical video encoding device that receives each CU of each frame of digitized video as an input image and outputs a bitstream, with reference to FIG. 17.
A video encoding device depicted in FIG. 17 includes a transformer/quantizer 1021, an entropy encoder 1056, an inverse quantizer/inverse transformer 1022, a buffer 1023, a predictor 1024, and an estimator 1025.
FIG. 18 is an explanatory diagram depicting an example of CTU partitioning of a frame t and an example of CU partitioning of the eighth CTU (CTU8) included in the frame t, in the case where the spatial resolution of the frame is the common intermediate format (CIF) and the CTU size is 64. FIG. 19 is an explanatory diagram depicting a quadtree structure corresponding to the example of CU partitioning of CTU8. The quadtree structure, i.e. the CU partitioning shape, of each CTU is signaled by split_cu_flag syntax described in NPL 1.
FIG. 20 is an explanatory diagram depicting PU partitioning shapes of a CU. In the case where the CU is an intra CU, square PU partitioning is selectable. In the case where the CU is an inter CU, not only square but also rectangular PU partitioning is selectable. The PU partitioning shape of each CU is signaled by part_mode syntax described in NPL 1.
FIG. 21 is an explanatory diagram depicting examples of TU partitioning of a CU. An example of TU partitioning of an intra CU having a 2N×2N PU partitioning shape is depicted in the upper part of the drawing. In the case where the CU is an intra CU, the root of the quadtree is located in the PU, and the prediction error of each PU is expressed by the quadtree structure. An example of TU partitioning of an inter CU having a 2N×N PU partitioning shape is depicted in the lower part of the drawing. In the case where the CU is an inter CU, the root of the quadtree is located in the CU, and the prediction error of the CU is expressed by the quadtree structure. The quadtree structure of the prediction error, i.e. the TU partitioning shape of each CU, is signaled by split_tu_flag syntax described in NPL 1.
The estimator 1025 determines, for each CTU, a split_cu_flag syntax value for determining a CU partitioning shape that minimizes the coding cost. The estimator 1025 determines, for each CU, a pred_mode_flag syntax value for determining intra prediction/inter prediction, a part_mode syntax value for determining a PU partitioning shape, and a split_tu_flag syntax value for determining a TU partitioning shape that minimize the coding cost. The estimator 1025 determines, for each PU, an intra prediction direction, a motion vector, etc. that minimize the coding cost.
NPL 2 discloses a method of determining the split_cu_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, the intra prediction direction, the motion vector, etc. that minimize coding cost J based on a Lagrange multiplier λ.
The following briefly describes a decision process for the split_cu_flag syntax value, the pred_mode_flag syntax value, and the part_mode syntax value, with reference to the section 4.8.3 Intra/Inter/PCM mode decision in NPL 2.
The section discloses a CU mode decision process of determining the pred_mode_flag syntax value and the part_mode syntax value of a CU. The section also discloses a CU partitioning shape decision process of determining the split_cu_flag syntax value by recursively executing the CU mode decision process.
The CU mode decision process is described first. InterCandidate which is a set of PU partitioning shape candidates of inter prediction, IntraCandidate which is a set of PU partitioning shape candidates of intra prediction, and JSSE(mode) which is a sum of square error (SSE) coding cost for a coding mode (mode) are defined as follows.                InterCandidate={INTER_2N×2N, INTER_2N×N, INTER_N×2N, INTER_2N×N, INTER_N×2N, INTER_2N×nU, INTER_2N×nD, INTER_nL×2N, INTER_nR×2N, INTER_N×N}        IntraCandidate={INTRA_2N×2N, INTRA_N×N}        JSSE(mode)=DSSE(mode)+λmode·Rmode(mode)        
                    [                  Math          .                                          ⁢          1                ]                                                                      λ          mode                =                  2                                    QP              -              12                        3                                                          
Here, DSSE(mode) denotes the SSE of the input image signal of the CU and the reconstructed image signal obtained in the encoding using mode, Rmode(mode) denotes the number of bits of the CU generated in the encoding using mode (including the number of bits of the below-mentioned transform quantization value), and QP denotes a quantization parameter.
In the CU mode decision process, bestPUmode which is the combination of pred_mode_flag syntax and part_mode syntax that minimize the SSE coding cost JSSE(mode) is selected from InterCandidate and IntraCandidate. The CU mode decision process can be formulated as follows.
                    [                  Math          .                                          ⁢          2                ]                                                            bestPUmode        =                  arg          ⁢                                    min                              PUmode                ∈                PUCandidate                                      ⁢                          {                                                J                  SSE                                ⁡                                  (                  PUmode                  )                                            }                                                                                      PUCandidate={InterCandidate, IntraCandidate}        
The CU partitioning shape decision process is described next.
The SSE coding cost of a CU (hereafter referred to as “node”) at CUDepth is the SSE coding cost of bestPUmode of the node, as depicted in FIG. 19. The SSE coding cost JSSE(node, CUDepth) of the node can thus be defined as follows.
                    [                  Math          .                                          ⁢          3                ]                                                                                  J            SSE                    ⁡                      (                          node              ,              CUDepth                        )                          =                              min                          PUmode              ∈              PUCandidate                                ⁢                      {                                          J                SSE                            ⁡                              (                PUmode                )                                      }                                                          
The SSE coding cost of the i-th (1≤i≤4) child CU (hereafter referred to as “child node”, “leaf”, or the like) of the CU at CUDepth is the SSE coding cost of the i-th CU at CUDepth+1. The SSE coding cost JSSE(leaf(i), CUDepth) of the i-th leaf can thus be defined as follows.                JSSE(leaf(i), CUDepth)=JSSE(node, CUDepth+1)        
Whether or not to split the CU into four child CUs can be determined by comparing whether or not the SSE coding cost of the node is greater than the sum of the SSE coding costs of its leaves. In the case where JSSE(node, CUDepth) is greater than the value of Expression (1) given below, the CU is split into four child CUs (split_cu_flag=1). In the case where JSSE(node, CUDepth) is not greater than the value of Expression (1), the CU is not split into four child CUs (split_cu_flag=0).
                    [                  Math          .                                          ⁢          4                ]                                                                      ∑                      i            =            1                    4                ⁢                                  ⁢                              J            SSE                    ⁡                      (                                          leaf                ⁡                                  (                  i                  )                                            ,              CUDepth                        )                                              (        1        )            
In the CU quadtree structure decision process, the above-mentioned comparison is recursively executed for each CUDepth, to determine the quadtree structure of the CTU. In other words, split_cu_flag of each leaf is determined for each CUDepth.
The estimator 1025 equally determines split_tu_flag, the intra prediction direction, the motion vector, etc., by minimizing the coding cost J based on the Lagrange multiplier λ.
The predictor 1024 generates a prediction signal corresponding to the input image signal of each CU, based on the split_cu_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, the intra prediction direction, the motion vector, etc. determined by the estimator 1025. The prediction signal is generated based on the above-mentioned intra prediction or inter-frame prediction.
The transformer/quantizer 1021 frequency-transforms a prediction error image obtained by subtracting the prediction signal from the input image signal, based on the TU partitioning shape determined by the estimator 1025.
The transformer/quantizer 1021 further quantizes the frequency-transformed prediction error image (frequency transform coefficient). The quantized frequency transform coefficient is hereafter referred to as “transform quantization value”.
The entropy encoder 1056 entropy-encodes the split_cu_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, the difference information of the intra prediction direction, and the difference information of the motion vector determined by the estimator 1025, and the transform quantization value.
The inverse quantizer/inverse transformer 1022 inverse-quantizes the transform quantization value. The inverse quantizer/inverse transformer 1022 further inverse-frequency-transforms the frequency transform coefficient obtained by the inverse quantization. The prediction signal is added to the reconstructed prediction error image obtained by the inverse frequency transform, and the result is supplied to the buffer 1023. The buffer 1023 stores the reconstructed image.
The typical video encoding device generates a bitstream based on the operation described above.
In the video encoding device depicted in FIG. 17, the load of all of the video encoding process for determining the split_cu_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, the intra prediction direction, the motion vector, etc. is concentrated at the specific estimator.