Non Patent Literature (NPL) 1 describes High Efficiency Video Coding (HEVC) which is a video coding scheme based on the ITU-T Recommendation H.265 standard.
In HEVC, each frame of digitized video is divided into coding tree units (CTUs) and the respective CTUs are encoded in the order of raster scan. Each CTU is divided into coding units (CUs) in the quad-tree structure and encoded. Each CU is divided into prediction units (PUs) before prediction. Moreover, a prediction error of each CU is divided into transform units (TUs) in the quad-tree structure and frequency-transformed. The largest size of CU and the minimum size of CU are referred to as a largest coding unit (LCU) and a smallest coding unit (SCU), respectively.
The CU is predictively encoded by intra prediction or inter frame prediction (inter prediction).
FIG. 7 is an explanatory diagram illustrating a CU division example in the case where the CTU size is 64×64 (64 pixels×64 pixels). (A) of FIG. 7 illustrates an example of a partitioning shape (hereinafter, also referred to as “block structure”) and (B) of FIG. 7 illustrates a CU quad-tree structure corresponding to the partitioning shape illustrated in (A) of FIG. 7.
Moreover, the CU is divided into TUs in the quad-tree structure. The way of division is the same as in the case of the CU division illustrated in (A) of FIG. 7. The layer (depth) illustrated in (B) of FIG. 7 is derived by focusing on the TU division.
When the division is performed in the case of coding by the intra prediction, TUs are sequentially divided with the PU, which is a block obtained by dividing the CU into four parts, as a starting point. In the case of coding by the inter prediction, TUs are sequentially divided with the CU as a starting point.
Referring to FIG. 8, the following describes the configuration and the operation of a general video coding device which outputs a bit stream with each CU of each frame of the digitized video as an input image.
FIG. 8 is a block diagram illustrating an example of a general video coding device. The video coding device illustrated in FIG. 8 includes a transformer 301, a quantizer 302, an entropy encoder 303, an inverse quantizer/inverse transformer 304, a buffer 305, a prediction unit 306, and an optimal prediction mode decision unit 307.
The optimal prediction mode decision unit 307 decides a CU quad-tree structure, a PU partitioning shape, and a TU quad-tree structure so as to obtain high coding efficiency in accordance with the features of the image for each CTU.
The prediction unit 306 generates a prediction signal for the input image signal of the CU on the basis of the CU quad-tree structure and the PU partitioning shape decided by the optimal prediction mode decision unit 307. The prediction signal is generated on the basis of the intra prediction or the inter prediction.
The transformer 301 frequency-transforms a prediction error image (prediction error signal) obtained by subtracting a prediction signal from the input image signal on the basis of the TU quad-tree structure decided by the optimal prediction mode decision unit 307. The transformer 301 uses orthogonal transform of block size 4×4, 8×8, 16×16, or 32×32 based on the frequency transform in the transform coding of the prediction error signal. Specifically, discrete sine transform (DST) approximated in integer arithmetic (of integer precision) is used for the 4×4 TU of a luminance component of an intra-encoded or inter-encoded CU. For other TUs, discrete cosine transform (DCT) approximated in integer arithmetic (of integer precision) corresponding to the block size is used.
Hereinafter, the discrete cosine transforming and the discrete sine transforming performed by the transformer 301 will be collectively referred to as “orthogonal transforming.”
The quantizer 302 quantizes a transform coefficient (orthogonal transform coefficient) supplied from the transformer 301. The inverse quantizer/inverse transformer 304 inversely quantizes the transform coefficient. Furthermore, the inverse quantizer/inverse transformer 304 inversely transforms the inversely-quantized transform coefficient. The inversely-transformed prediction error image is supplied to the buffer 305 with the prediction signal added. The buffer 305 stores the image as a reference image.