In a standard specification of moving image encoding, an image is divided into multiple pixel blocks, and a pixel of a block to be encoded is predicted by using a pixel in the same screen (frame) or a pixel in a different frame for each block. Then, only a difference (residual) between an actual pixel and a predicted pixel is encoded to compress information. Note that, examples of the standard specification include H.264/Moving Picture Experts Group (MPEG)-4 AVC (hereinafter H.264) and H.265/HEVC (hereinafter H.265), and NPL 1 proposes a method for compressing information based on the H.265 standard.
In H.265 moving image encoding processing, encoding is performed in a block unit of 64×64 pixels maximum, which is called a coding tree unit (CTU). The CTU is further divided into coding units (CUs) having a variable size, and prediction processing is performed on each CU. Note that, the CU size can be changed from 64×64 up to 8×8 pixels. Then, each of the CUs selects any one of an intra prediction mode, an inter prediction mode, and a skip mode, and performs prediction processing and encoding processing.
Hereinafter, terms of a frame that has been encoded (or an encoded frame) and a reference frame may be used. The encoded frame means a frame in which encoding processing has been performed before on a frame in which current encoding processing is performed. Then, the reference frame is such an encoded frame and a frame to be used in current prediction processing.
The intra prediction mode is a mode of predicting a block to be encoded by using an encoded pixel in a frame and encoding a prediction direction and a prediction residual.
The inter prediction mode is a mode of predicting a block to be encoded by using a pixel in a reference frame and encoding motion information and a prediction residual to the reference frame.
The skip mode is one kind of inter predictions that predict a pixel by using a reference frame but is a special mode of not encoding motion information and a prediction residual. In H.265, motion information is copied and encoded from a motion vector list (merge list) generated from an adjacent block. The skip mode is an important prediction mode that has a high possibility of being selected and greatly contributes to encoding efficiency. However, since a motion vector (skip vector or an amount of travel) is selected from five candidates (skip vector candidates) at the maximum, an amount of calculation increases when mode determining processing is performed on all the skip vector candidates.
Note that, presence or absence of a residual may be specified in each color component separately from (independently of) the prediction mode. Although image quality deteriorates by compulsorily setting a residual to zero, there is an advantage that an amount of codes can be reduced.
FIG. 6 illustrates an example of a configuration of a moving image encoding device 100 based on the H.265 standard. The moving image encoding device 100 includes a transform quantization unit 101, an encoding unit 102, an inverse transform-inverse quantization unit 103, a synthesizer 104, a loop filter 105, a frame buffer 106, an encoding mode determining unit 107, an intra prediction unit 108, an inter prediction unit 109, and an adder 110.
Then, the moving image encoding device 100 runs as follows. First, the encoding mode determining unit 107 determines a CU size and also determines which mode of the intra prediction mode, the inter prediction mode, and the skip mode is used for an input image G1. At this time, when the intra prediction mode is determined, a prediction direction is also determined. Further, when the inter prediction mode or the skip mode is determined, motion information is also determined.
The intra prediction unit 108 or the inter prediction unit 109 generates a prediction image G2 according to the determined mode. The prediction image G2 is input to the adder 110, and a difference between the input image G1 and the prediction image G2 is obtained. The obtained difference is input as a residual signal G3 to the transform quantization unit 101.
The transform quantization unit 101 transforms the residual signal G3 into an integer and further quantizes a transformed factor. The quantized factor is output as a quantization transform signal G4 to the encoding unit 102 and the inverse transform-inverse quantization unit 103.
The encoding unit 102 encodes and outputs the quantization transform signal G4. On the other hand, the inverse transform-inverse quantization unit 103 inversely quantizes the quantization transform factor G4 and then inversely transforms the quantization transform factor G4 into an integer, and outputs the integer as an inverse quantization transform signal G5 to the synthesizer 104.
The synthesizer 104 synthesizes the inverse quantization transform signal G5 and the prediction image G2, and outputs this as a rebuilt image G6 to the loop filter 105, the encoding mode determining unit 107, and the intra prediction unit 108.
The loop filter 105 removes a block distortion from the rebuilt image G6 and stores the rebuilt image G6 in the frame buffer 106. The frame buffer 106 outputs the stored rebuilt image G6 to the inter prediction unit 109 in response to a request.
The intra prediction unit 108 performs an intra prediction of the same frame by using the rebuilt image G6. Further, the inter prediction unit 109 performs an inter prediction by using the rebuilt image G6 from which the block distortion is removed.
The encoding mode determining unit 107 determines a mode by using the rebuilt image G6. In general, it is important to combine a CU size, a prediction mode, and presence or absence of a residual optimally by the encoding mode determining unit 107 in order to achieve a high degree of encoding efficiency. Thus, a technique called a rate-distortion (RD) optimization is widely used in recent moving image encoding devices.
In the RD optimization, a rate-distortion (RD) cost expressed by J=D+λR is calculated for each encoding mode, and a combination having the smallest RD cost is adopted. Herein, D is an amount of distortion due to encoding, R is an amount of codes generated by encoding, and λ is a weighting factor depending on complexity of an image or the like.
A sum of squared error (SSE) of a pixel of the input image G1 and a pixel of the rebuilt image G6 is used as the amount of distortion D.
However, calculation of the amount of distortion D and the amount of codes R requires a lot of processing such as transform and quantization, encoding, and inverse transform and inverse quantization of a difference between the prediction image G2 and the input image G1, thereby increasing an amount of calculation.
In addition, H.265 has new tools such as variety of encoding block sizes, an expansion of an intra prediction direction, and rate-distortion optimized quantization (RDOQ) added to H.264. Thus, the number of combinations of encoding modes significantly increases in H.265, thereby increasing an amount of calculation.
Particularly in a high-resolution image such as 4K, processing of calculating and comparing RD costs of combinations of all encoding modes is unrealistic in terms of an obtained effect. In other words, wasted processing increases. Note that, the RD cost is expressed by magnitude of a prediction residual of an input image and weighting addition of an amount of data to be generated. Therefore, when a block size having the smallest RD cost is selected, an optimum balance between objective image quality and the amount of data can be achieved.
FIG. 7 is a block diagram of the above-mentioned encoding mode determining unit 107. The encoding mode determining unit 107 includes a skip information determining unit 121, an inter prediction information determining unit 122, and an intra prediction information determining unit 123. The encoding mode determining unit 107 further includes a sub-block cost calculating unit 124, an inter residual removal determining unit 125, an intra residual removal determining unit 126, and a mode determining unit 127.
Note that, a current image, a reference frame, and a skip vector candidate are input to the skip information determining unit 121. A current image and a reference frame are input to the inter prediction information determining unit 122. A current image and a rebuilt image are input to the intra prediction information determining unit 123.
The skip information determining unit 121 generates the prediction image G2 by a reference frame for a plurality of skip vector candidates generated from motion information in a close block. Then, the skip information determining unit 121 performs transform and quantization, encoding, and inverse transform and inverse quantization on the prediction image G2 and a current image as an encoding target to obtain an RD cost. Subsequently, the skip information determining unit 121 selects a skip vector candidate having the smallest RD cost among a plurality of skip vector candidates and outputs the skip vector candidate together with the RD cost.
The inter prediction information determining unit 122 determines motion information by a motion vector search that estimates an amount of motion of an image. Then, the inter prediction information determining unit 122 generates the prediction image G2 from a current image and a reference frame, based on the determined motion information, obtains an RD cost, and outputs the RD cost together with the motion information.
The intra prediction information determining unit 123 selects a direction having the smallest RD cost among multiple prediction directions and outputs the direction together with the RD cost in the selected prediction direction.
The sub-block cost calculating unit 124 obtains a total of RD costs of included sub-blocks. For example, when a CU size is 16×16, it is assumed that four 8×8 CUs generated by dividing the 16×16 CU are sub-blocks and a total of RD costs thereof is a sub-block cost.
In the inter prediction mode, the inter residual removal determining unit 125 calculates an RD cost when a residual (namely, a transform factor) is compulsorily set to zero, and selects a residual zero mode when the cost is smaller than a cost having a residual.
In the intra prediction mode, the intra residual removal determining unit 126 calculates an RD cost when a residual (namely, a transform factor) is compulsorily set to zero, and selects a residual zero mode when the cost is smaller than a cost having a residual.
Hereinafter, an RD cost calculated by the skip information determining unit 121 is described as a skip RD cost, an RD cost calculated by the inter prediction information determining unit 122 is described as an inter RD cost, an RD cost calculated by the intra prediction information determining unit 123 is described as an intra RD cost, an RD cost calculated by the inter residual removal determining unit 125 is described as an inter residual RD cost, an RD cost calculated by the intra residual removal determining unit 126 is described as an intra residual RD cost, and an RD cost calculated by the sub-block cost calculating unit 124 is described as a sub RD cost. Then, when these are collectively called, it is simply described as an RD cost.
Then, the mode determining unit 127 compares an RD cost in the skip mode, the inter prediction mode, and the intra prediction mode with a sub RD cost, selects a minimum mode, and determines a prediction mode. The determined prediction mode and the RD cost are used as a sub RD cost of a greater CU.
Such processing requires a lot of calculation, and therefore PTL 1 and NPL 2 disclose technologies for reducing an amount of calculation by limiting an encoding mode to calculate an RD cost.
In PTL 1, an amount of calculation is reduced by limiting an encoding mode being evaluated based on an RD cost in some encoding modes under H.264. This technology, for example, first calculates a skip RD cost in a 16×16-pixel skip mode and an inter RD cost in a 16×16-pixel inter prediction mode. Then, an encoding mode is determined without evaluating the inter prediction mode having a smaller size when the skip RD cost is smaller than the inter RD cost, the 16×16-pixel inter prediction mode has no residual, and the skip mode and the inter prediction mode have the same motion information. Thus, the number of modes that may be used, that is, processing of calculating an RD cost, can be reduced.
Further, NPL 2 discloses a technology for first evaluating a skip RD cost, and determining, when the cost is equal to or less than a fixed threshold, a skip mode without evaluating other modes. Also in this case, processing of calculating an RD cost of other modes can be reduced based on a skip RD cost.