1. Field of the Invention
The present invention relates to an image coding apparatus, a control method therefor, and a computer program.
2. Description of the Related Art
In an encoding process of motion pictures, H. 264 encoding allows inter-prediction mode and intra-prediction mode to be selected in inter slices. In the inter-prediction mode, in relation to macroblocks to be encoded, block matching is performed between reference pictures and a current picture, and a position which gives the best coding efficiency is determined to be a motion vector position. To determine the motion vector position, generally a cost function is used, where the cost function is based on differential data obtained by subtraction between the reference picture and the current picture. Various cost functions are conceivable, and a typical example is given by Eq. (1).Cost=SATD+Qp×Mvcost  (1)where SATD is the sum of absolute Hadamard transform differences calculated using the differential data obtained by subtraction between the reference pictures and current picture, Qp is a quantization parameter used for a quantization process, and Mvcost is a cost value equivalent to a code amount of a motion vector corresponding to length of the motion vector. Using the cost function, a position with the smallest cost value is determined to be the motion vector position. The differential data is generated through subtraction between the reference picture at the motion vector position thus determined and the current picture. The differential data is subjected to orthogonal transform, quantization, and variable-length coding processes to perform encoding in inter prediction mode.
On the other hand, in the intra-prediction mode, pixels around the macroblock to be encoded are used as reference pictures. Pixels used as reference pictures in the intra-prediction mode are shown in FIG. 2. Intra-prediction modes include intra 4×4 prediction, intra 8×8 prediction, and intra 16×16 prediction modes, each of which supports multiple prediction modes such as vertical prediction mode, horizontal prediction mode, and DC prediction mode. The intra 4×4 prediction mode will be described as an example here. Pixels A to M around pixels a to p in the 4×4 block to be encoded are used as pixels of reference pictures. Pixels A to D are four adjacent pixels just above the 4×4 block to be encoded. Pixels E to H are four successive pixels extending to the right of pixel D. Pixels I to L are four adjacent pixels to the immediate left of the 4×4 block to be encoded. Pixel M is located above pixel I. The surrounding pixels A to M are pixels of a locally decoded picture after encoding rather than pixels of an original picture.
The way in which reference pictures are created varies with the prediction mode. As shown in FIG. 3, the intra 4×4 prediction is provided with nine prediction modes: prediction mode 0 to prediction mode 8.A method for creating reference pictures in each prediction mode is shown in FIG. 4. For example, in prediction mode 0, reference pictures are generated from pixels A to D vertically adjacent to one another. Regarding pixels a to p in the 4×4 block to be encoded, pixel A provides a reference picture for pixels a, e, i, and m in the first column and pixel B provides a reference picture for pixels b, f, j, and n in the second column. Similarly, pixel C provides a reference picture for pixels c, g, k, and o in the third column and pixel D provides a reference picture for pixels d, h, l, and p in the fourth column.
Prediction mode 2 is DC prediction mode, in which pixels shown in Eq. (2) provide references picture for all pixels a to p.(A+B+C+D+I+J+K+L+4) >>3  (2)
Subtraction is performed between the reference picture thus generated in each prediction mode and the current picture, and consequently differential data is generated. Using the differential data, the cost function Eq. (1) is calculated (in the case of intra-prediction mode, Mvcost is a cost equivalent to a code amount of the prediction mode), and a prediction mode with the smallest cost value is selected for use in encoding. The differential data between the reference picture and the current picture in the selected prediction mode is subjected to orthogonal transform, quantization, and variable-length coding processes to perform encoding in inter prediction mode.
A selection between inter-prediction mode and intra-prediction mode in inter slices is made by comparison between the cost value at motion vector position and the cost value of optimum inter-prediction mode. Since a smaller cost value leads to higher coding efficiency and generally higher picture quality, conceivably there is a method that selects the prediction mode with a smaller cost value. The method is advantageous in terms of the coding efficiency of the macroblock, but presents problems described below. First, the inter-prediction mode and intra-prediction mode differ in the method for generating reference pictures. Thus, when inter-prediction mode and intra-prediction mode coexist in an aggregate area, such as a surface of a grass field or athletic field, made up of a collection of mostly flat macroblocks, visual degradation varies between the two modes even if the two modes have comparable cost values. That is, the degradation appears more conspicuous in the intra-prediction mode. Therefore, in an aggregate area, such as a surface of a field, made up of a collection of mostly flat macroblocks, degradation is particularly conspicuous in macroblocks encoded in intra-prediction mode, resulting in loss of picture quality.
To solve this problem, a conventional technique uses a determination formula, which makes intra-prediction mode harder to be selected in a flat part such as a field surface where activity that represents flatness of each macroblock is lower (see Japanese Patent Laid-Open No. 2006-094081).
The picture to be encoded may contain not only a flat aggregate area made up of a collection of mostly flat macroblocks, but also an area in which macroblocks with a large number of high-frequency components and other macroblocks coexist. In encoding low-flatness macroblocks containing high-frequency components and existing in the latter area, even if the intra-prediction mode is selected, visual degradation is not particularly noticeable. However, since the proposed technique described above only considers flatness on a macroblock by macroblock basis, the intra-prediction mode is less likely to be selected. Consequently, compared to a method which simply selects inter-prediction mode or intra-prediction mode whichever has a smaller cost value, the proposed technique has a problem in that it provides a low coding efficiency, resulting in loss of picture quality.