NPL 1 describes High Efficiency Video Coding (HEVC) that is a video coding method based on ITU-T recommendation H.265 standard.
In HEVC, frames of a digitized video are divided into coding tree units (CTUs) and the CTUs are encoded in raster scanning order. Each CTU has a quadtree structure and is encoded by being recursively divided into coding units (CUs). Each CU is predicted by being divided into prediction units (PUs). A prediction error (residual) of each CU is divided into transform units (TUs) in a quadtree structure and the divided transform units are subjected to frequency conversion.
FIG. 10A and FIG. 10B each are an illustrative diagram illustrating a quadtree structure corresponding to an CU division example of a CTU. In the example illustrated in FIG. 10A, a quadtree structure of a CTU can be expressed, as illustrated in a hierarchical structure illustrated in FIG. 10B, by cu_split_flag=1 of a CUDepth=0 indicating that a block of 64×64 is divided, cu_split_flag=0 of three CUDepth=1 s indicating that first three CUs (CU0, CU1, and CU2) of 32×32 are not divided, cu_split_flag=1 of a CUDepth=1 indicating that a last CU of 32×32 is divided, cu_split_flag=0 of three CUDepth=2 s indicating that first three CUs (CU3, CU4, and CU5) of 16×16 are not divided, cu_split_flag=1 of a CUDepth=2 indicating that a last CU of 16×16 is divided, and cu_split_flag=0 of four CUDepth=3 s indicating that all CUs (CU6, CU7, CU8, and CU9) of 8×8 are not divided.
FIG. 11 is an illustrative diagram illustrating a PU division shape of a CU. In the case of intra-prediction, PU division (2N×2N or N×N) of a square can be selected (however, when a CU is larger than the minimum size, only 2N×2N can be selected).
A CU is encoded by predictive coding of intra-prediction or inter-frame prediction for each PU. Hereinafter, intra-prediction is described.
Intra-prediction is prediction for generating a prediction image from a reference pixel in an encoding target frame. In NPL 1, 33 types of angle intra-prediction illustrated in FIG. 12 are defined. Angle intra-prediction extrapolates reference pixels located in the periphery of an encoding target block to any one of 33 types of directions illustrated in FIG. 12 and generates an intra-prediction signal (prediction pixel). In NPL 1, in addition to 33 types of angle intra-prediction, DC prediction for averaging the reference pixels located in the periphery of an encoding target block and planar prediction for linearly interpolating reference pixels located in the periphery of an encoding target block are defined.
In FIG. 12, each rectangle of an upper most row and each rectangle of a leftmost column indicate a reference pixel. A number in a rectangle indicates a coordinate. An arrow indicates a prediction direction. A number assigned to a vicinity of an arrow indicates a prediction mode.
With reference to FIG. 13, a configuration and an operation of a general video encoding device that outputs a bitstream in which each CTU of each frame of a digitized video is an input image are described.
FIG. 13 is a block diagram illustrating one example of a general video encoding device. The video encoding device illustrated in FIG. 13 includes a transform unit 301, a quantization unit 302, an entropy encoding unit 303, an inverse-quantization/inverse-transform unit 304, a buffer 305, a prediction unit 306, and a prediction mode/block size determination unit 307.
The prediction mode/block size determination unit 307 determines a combination of a prediction mode and a block size for minimizing an encoding cost for each CTU. The prediction mode/block size determination unit 307 determines a TU quadtree structure, in addition to a CU quadtree structure/a PU division shape.
The prediction unit 306 generates a prediction signal for an input image signal of a CU, based on a prediction mode and a block size determined by the prediction mode/block size determination unit 307. A prediction signal is generated based on intra-prediction or inter-prediction.
The transform unit 301 frequency-transforms a residual image (a residual signal: a prediction error signal) acquired by subtracting a prediction signal from an input image signal, based on a TU quadtree structure determined by the prediction mode/block size determination unit 307. The transform unit 301 uses orthogonal transform of a 4×4, 8×8, 16×16, or 32×32 block size based on frequency transform in transform encoding of a residual signal. An n×n block size indicates a size of vertical n pixels and horizontal n pixels.
The quantization unit 302 quantizes an orthogonal transform coefficient supplied from the transform unit 301. Hereinafter, a quantized orthogonal transform coefficient may be referred to as a transform quantization value. The inverse-quantization/inverse-transform unit 304 inversely quantizes a transform quantization value. The inverse-quantization/inverse-transform unit 304 inversely transforms an inversely-quantized orthogonal transform coefficient. An inversely-transformed residual image is added with a prediction signal (predication image) and stored on the buffer 305. The buffer 305 stores an image as a reference image.
The prediction mode/block size determination unit 307 may be configured to predict, when determining a prediction mode and a PU division shape, for example, combinations of all usable prediction modes and all block sizes (in the case of intra-prediction, 4×4, 8×8, 16×16, 32×32, and 64×64), calculate a residual, and then determine an optimum combination, based on a prediction result and the like (see, for example, paragraphs 0052 to 0054 of PTL 1). An optimum combination is, for example, a combination that minimizes an encoding cost. A video encoding device including the prediction mode/block size determination unit 307 that evaluates combinations of all prediction modes and all block sizes is hereinafter referred to as a first video encoding device. PTL 2 describes an image processing device that evaluates all usable prediction modes and determines an optimum prediction mode (see, for example, paragraph 0329).
There is also a method of previously, i.e., before determining a prediction mode, determining a block size, based on an encoding cost and the like, and determining a prediction mode by using the determined block size. A video encoding device based on such a method is hereinafter referred to as a second video encoding device.
In encoding that allows a large block size as in HEVC, in an area similar in a feature of a video signal, a large PU size (a block size of a PU) and a large TU size (a block size of a TU) are applied, and thereby encoding efficiency is improved. Specifically, a bit volume (the number of bins i.e. a count of bins, referred to as a bin number) of data after encoding decreases.
There is a video encoding device including a function of expanding a block when a predetermined condition is satisfied in order to increase a block size as large as possible (see, for example, PTL 3). Expansion of a block indicates that a plurality of blocks are integrated into one block. A video encoding device including such a function is hereinafter referred to as a third video encoding device.
FIG. 14 is a block diagram illustrating a configuration of a third video encoding device described in PTL 3. The video encoding device illustrated in FIG. 14 includes an encoding parameter determination unit 110 that generates and outputs an encoding parameter by using an input video as input, an encoding unit 120, and a block expansion unit 370. The encoding unit 120 includes a configuration equal to a configuration in which the prediction mode/block size determination unit 307 is excluded from the video encoding device illustrated in FIG. 13.
When encoding of an input image starts, the encoding parameter determination unit 110 executes block division, searches an encoding mode (intra-prediction, inter-prediction, a skip mode, or the like) for each divided block and a prediction mode, and determines an encoding parameter #1. The encoding parameter determination unit 110 calculates an encoding cost and determines an encoding parameter, based on the encoding cost. An encoding cost is reflected with a value (the bin number) concerning an encode volume and encoding distortion (correlated with image quality). The encoding parameter determination unit 110 uses the following rate distortion (RD) cost as one example.cost=D+λ·R  (1)
In equation (1), D is encoding distortion, R is an encode volume considering also a transform coefficient, and λ is a Lagrange multiplier.
The block expansion unit 370 receives an encoding parameter #1, modifies a block size and motion vector information in the encoding parameter, and outputs the modified encoding parameter. The output encoding parameter is input to the encoding unit 120.
The block expansion unit 370 expands a block when, for example, all four blocks adjacent to each other have the same size and previously determined m (m: an integer) or more blocks are intra-prediction blocks among these four blocks. In other words, four blocks are integrated into one block. When inter-prediction is used, further, a motion vector of an integrated block is determined based on motion vectors of four blocks before integration. The block expansion unit 370 sets, for example, a motion vector of any block of the four blocks before integration as a motion vector of one integrated block or sets an average vector of motion vectors of the four blocks as a motion vector of one integrated block.
PTL 4 describes that a plurality of basic CUs included in an integration area that is an area of N×N pixels including a plurality of basic blocks are integrated as one new CU, and an encode string is generated based on the new CU after integration. A plurality of basis CUs included in an integrated area are integrated when all of a plurality of basis CUs and a plurality of basis PUs belonging to an integrated area have the same block size and pieces of prediction information of all the basis PUs included in the integrated area are the same.