In a video coding system described in Non Patent Literature 1, each frame of digitized video is split into coding tree units (CTUs), and each CTU is encoded in raster scan order.
Each CTU is split into coding units (CUs) and encoded, in a quadtree structure. Each CU is split into prediction units (PUs) and prediction-encoded. Prediction encoding includes intra prediction and inter-frame prediction.
A prediction error of each CU is split into transform units (TUs) and transform-encoded based on frequency transform, in a quadtree structure.
A CU of the largest size is referred to as a largest CU (largest coding unit: LCU), and a CU of the smallest size is referred to as a smallest CU (smallest coding unit: SCU). The LCU size and the CTU size are the same.
The following describes intra prediction and inter-frame prediction, and signaling of CTU, CU, PU, and TU.
Intra prediction is prediction for generating a prediction image from a reconstructed image having the same display time as a frame to be encoded. Non Patent Literature 1 defines 33 types of angular intra prediction depicted in FIG. 9. In angular intra prediction, a reconstructed pixel near a block to be encoded is used for extrapolation in any of 33 directions, to generate an intra prediction signal. In addition to 33 types of angular intra prediction, Non Patent Literature 1 defines DC intra prediction for averaging reconstructed pixels near the block to be encoded, and planar intra prediction for linear interpolating reconstructed pixels near the block to be encoded. A CU encoded based on intra prediction is hereafter referred to as intra CU.
Inter-frame prediction is prediction for generating a prediction image from a reconstructed image (reference picture) different in display time from a frame to be encoded. Inter-frame prediction is hereafter also referred to as inter prediction. FIG. 10 is an explanatory diagram depicting an example of inter-frame prediction. A motion vector MV=(mvx, mvy) indicates the amount of translation of a reconstructed image block of a reference picture relative to a block to be encoded. In inter prediction, an inter prediction signal is generated based on a reconstructed image block of a reference picture (using pixel interpolation if necessary). A CU encoded based on inter-frame prediction is hereafter referred to as “inter CU”.
A frame encoded including only intra CUs is called “I frame” (or “I picture”). A frame encoded including not only intra CUs but also inter CUs is called “P frame” (or “P picture”). A frame encoded including inter CUs that each use not only one reference picture but two reference pictures simultaneously for the inter prediction of the block is called “B frame” (or “B picture”).
Skip mode indicates that a CU to be processed is prediction-encoded by frame prediction based on 2N×2N shape of the below-mentioned PU partitioning shape and the below-mentioned transform quantization value is not present. Whether or not each CU is skip mode is signaled by skip_flag syntax described in Non Patent Literature 1.
Whether each CU that is not skip mode is an intra CU or an inter CU is signaled by pred_mode_flag syntax described in Non Patent Literature 1.
FIG. 11 is an explanatory diagram depicting an example of CTU partitioning of a frame t and an example of CU partitioning of the eighth CTU (CTU8) included in the frame t, in the case where the spatial resolution of the frame is the common intermediate format (CIF) and the CTU size is 64.
FIG. 12 is an explanatory diagram depicting a quadtree structure corresponding to the example of CU partitioning of CTU8. The quadtree structure, i.e. the CU partitioning shape, of each CTU is signaled by cu_split_flag syntax described in Non Patent Literature 1.
FIG. 13 is an explanatory diagram depicting PU partitioning shapes of a CU. In the case where the CU is an intra CU, square PU partitioning is selectable. In the case where the CU is an inter CU, not only square but also rectangular PU partitioning is selectable. The PU partitioning shape of each CU is signaled by part_mode syntax described in Non Patent Literature 1.
FIG. 14 is an explanatory diagram depicting examples of TU partitioning of a CU. An example of TU partitioning of an intra CU having a 2N×2N PU partitioning shape is depicted in the upper part of the drawing. In the case where the CU is an intra CU, the root of the quadtree is located in the PU, and the prediction error of each PU is expressed by the quadtree structure. An example of TU partitioning of an inter CU having a 2N×N PU partitioning shape is depicted in the lower part of the drawing. In the case where the CU is an inter CU, the root of the quadtree is located in the CU, and the prediction error of the CU is expressed by the quadtree structure. The quadtree structure of the prediction error, i.e. the TU partitioning shape of each CU, is signaled by split_tu_flag syntax described in Non Patent Literature 1.
This completes the description of intra prediction and inter-frame prediction, and signaling of CTU, CU, PU, and TU.
The following describes the structure and operation of a typical video encoding device that receives each CU of each frame of digitized video as an input image and outputs a bitstream, with reference to a block diagram in FIG. 15.
A video encoding device depicted in FIG. 15 includes a transformer/quantizer 101, an entropy encoder 102, an inverse quantizer/inverse transformer 103, a buffer 104, a predictor 105, and a multiplexer 106.
The predictor 105 determines, for each CTU, a cu_split_flag syntax value for determining a CU partitioning shape that minimizes the coding cost.
The predictor 105 then determines, for each CU, a pred_mode_flag syntax value for determining intra prediction/inter prediction, a part_mode syntax value for determining a PU partitioning shape, a split_tu_flag syntax value for determining a TU partitioning shape, an intra prediction direction, and a motion vector that minimize the coding cost.
The predictor 105 further determines a skip_flag syntax value for determining skip mode.
In detail, in the case where, for the CU to be processed, the determined pred_mode_flag indicates inter prediction, the determined part_mode indicates 2N×2N, and the below-mentioned transform quantization value is not present, the predictor 105 sets skip_flag to 1 (i.e. skip mode is set). Otherwise, the predictor 105 sets skip_flag to 0 (i.e. skip mode is not set).
The predictor 105 generates a prediction signal corresponding to the input image signal of each CU, based on the determined cu_split_flag syntax value, pred_mode_flag syntax value, part_mode syntax value, split_tu_flag syntax value, intra prediction direction, motion vector, etc. The prediction signal is generated based on the above-mentioned intra prediction or inter-frame prediction.
The transformer/quantizer 101 frequency-transforms a prediction error image obtained by subtracting the prediction signal from the input image signal, based on the TU partitioning shape determined by the predictor 105.
The transformer/quantizer 101 further quantizes the frequency-transformed prediction error image (frequency transform coefficient). The quantized frequency transform coefficient is hereafter referred to as “transform quantization value”.
The entropy encoder 102 entropy-encodes the cu_split_flag syntax value, the skip_flag syntax value, the pred_mode_flag syntax value, the part_mode syntax value, the split_tu_flag syntax value, the difference information of the intra prediction direction, and the difference information of the motion vector determined by the predictor 105 (these prediction-related information are hereafter also referred to as “prediction parameters”), and the transform quantization value.
The inverse quantizer/inverse transformer 103 inverse-quantizes the transform quantization value. The inverse quantizer/inverse transformer 103 further inverse-frequency-transforms the frequency transform coefficient obtained by the inverse quantization.
The prediction signal is added to the reconstructed prediction error image obtained by the inverse frequency transform, and the result is supplied to the buffer 104. The buffer 104 stores the reconstructed image.
The multiplexer 106 multiplexes and outputs the entropy-encoded data supplied from the entropy encoder 102, as a bitstream.
The typical video encoding device generates a bitstream by the operation described above.
The following describes the structure and operation of a typical video decoding device that receives a bitstream as input and outputs a decoded video frame, with reference to FIG. 16.
A video decoding device depicted in FIG. 16 includes a de-multiplexer 201, an entropy decoder 202, an inverse quantizer/inverse transformer 203, a predictor 204, and a buffer 205.
The de-multiplexer 201 de-multiplexes an input bitstream to extract an entropy-encoded video bitstream.
The entropy decoder 202 entropy-decodes the video bitstream. The entropy decoder 202 entropy-decodes the prediction parameters and the transform quantization value, and supplies them to the inverse quantizer/inverse transformer 203 and the predictor 204.
The inverse quantizer/inverse transformer 203 inverse-quantizes the transform quantization value. The inverse quantizer/inverse transformer 203 further inverse-frequency-transforms the frequency transform coefficient obtained by the inverse quantization.
After the inverse frequency transform, the predictor 204 generates a prediction signal using a reconstructed image stored in the buffer 205, based on the entropy-decoded prediction parameters.
After the prediction signal is generated, the prediction signal supplied from the predictor 204 is added to the reconstructed prediction error image obtained by the inverse frequency transform by the inverse quantizer/inverse transformer 203, and the result is supplied to the buffer 205 as a reconstructed image.
The reconstructed image stored in the buffer 205 is then output as a decoded image (decoded video).
The typical video decoding device generates a decoded image by the operation described above.
Non Patent Literature 2 discloses a video coding technique using a block partitioning structure based on a quadtree and a binary tree (BT), which is called QuadTree plus Binary Tree (QTBT) and is an extension to the above-mentioned system described in Non Patent Literature 1.
In a QTBT structure, a coding tree unit (CTU) is recursively split into square coding units (CUs) based on a quadtree structure. Each recursively split CU is further recursively split into rectangular or square blocks based on a binary tree structure, for a prediction process or a transform process. In the QTBT structure, part_mode syntax is not used.
FIG. 17 is an explanatory diagram depicting the QTBT structure described in Non Patent Literature 2. An example of block partitioning of a CTU is shown in (a) of FIG. 17, and its tree structure is shown in (b) of FIG. 17. In FIG. 17, each solid line indicates partitioning based on the quadtree structure, and each dashed line indicates partitioning based on the binary tree structure. In partitioning based on the binary tree structure, rectangular blocks are allowed, so that information indicating the splitting direction (the direction in which the splitting line extends) is necessary. In (b) of FIG. 17, 0 indicates splitting in the horizontal direction, and 1 indicates splitting in the vertical direction. The QTBT structure can express rectangular partitioning shapes more flexibly, and thus enhance the compression efficiency of the video system based on the block partitioning structure described in Non Patent Literature 1.