1. Field of Invention
The present invention relates to coding mode determining apparatuses, image coding apparatuses, coding mode determining methods and coding mode determining programs.
2. Description of the Related Art
MPEG-4 has garnered attention as a key technology in the multimedia and internet age. MPEG-4 is characterized, for example, in that it has been improved in coding efficiency as compared with MPEG-½ in order to support application areas such as mobile communications and the Internet (see e.g., “All about MPEG-4”, 1st Ed., written and edited by Sukeichi Miki, Kogyo Chosakai Publishing Inc., Sep. 30, 1998, p. 37-58).
In MPEG-4, a method called “AVC” has been established as a new highly efficient coding method. AVC is a coding method called “ISO MPEG-4 Part10 Advanced Video Coding” or “ITU-T H.264”.
This method is aimed at achieving an improved coding efficiency, for example, by enabling motion estimation or DCT even for image blocks of 4×4 pixels, and selecting the image for motion estimation from a plurality of pictures. Since AVC is a multi-function coding method in which the techniques that have been used for conventional coding methods are adopted, the challenge is to realize its optimal use in accordance with the application areas.
For example, in MPEG-4, which was established prior to the establishment of AVC, there is a relatively small number of combinations of candidate coding modes (e.g., partition size, prediction direction and direct mode) for each macroblock, so that the processing load on the encoder is not large even when these candidates are fully covered and an optimal coding mode is searched for at the time of coding.
On the other hand, with AVC, it is possible to divide a macroblock of 16×16 pixels (hereinafter, referred to as “16×16”) into macroblock partitions (hereinafter, referred to as “small blocks”) of 16×16, 16×8, 8×16 and 8×8, as shown in FIG. 25. Also, it is possible to divide a small block of 8×8 pixels into sub-macroblock partitions of 8×8, 8×4, 4×8 and 4×4.
Hereinafter, one small block divided into 16×16 is referred to as a small block Sb1, two small blocks divided into 16×8 as small blocks Sb2 and Sb3, two small blocks divided into 8×16 as small blocks Sb4 and Sb5, and four small blocks divided into 8×8 as small blocks Sb6 to Sb9.
Additionally, with AVC, it is possible to perform motion estimation for each of the small blocks Sb1 to Sb9 by referencing a reference picture, as shown in FIG. 26. The same also applies to each of the sub-macroblock partitions. Furthermore, with AVC, it is possible to perform inter prediction such as forward prediction (see FIG. 27(a)) in which a reference picture that temporally precedes a picture to be coded is referenced, backward prediction (see FIG. 27(b)) in which a reference picture that temporally follows a picture to be coded is referenced, or bi-directional prediction (see FIG. 27(c)) in which reference pictures that are on both sides of a picture to be coded are referenced, as shown in FIG. 27.
<Process of Conventional Encoder>
A process of a conventional encoder in which all the above-described coding modes are covered will be described with reference to FIGS. 28 and 29.
The conventional encoder carries out motion estimation for all of the small blocks obtained by dividing an image block with a plurality of candidate division methods. Furthermore, it selects the reference picture and the division method of the image blocks individually for each of the small blocks, and performs coding using the selected division method.
Here, at the time of selecting the reference picture and the division method for the image block for each of the small blocks, an amount called “coding cost” is used. The coding cost is an amount represented by the sum of the pixel differential value, (the sum of the absolute difference between small blocks and predicted image) and the code amount of motion information (e.g., motion vector or differential motion vector), and a smaller coding cost of each image block indicates a better coding efficiency of the image block. Further, the sum of the squared differences, or the sum of the absolute values of errors after performing Hadamard transform or DCT transform on the difference is sometimes used, instead of the sum of the absolute difference.
FIG. 28 is a block diagram showing a process flow of motion estimation for each of the small blocks. The process shown in FIG. 28 is performed for each of the small blocks of M×N ((M,N)=(16,16), (16,8), (8,16), (8,8)) obtained by dividing an image block of 16×16. The process flow of motion estimation shown in FIG. 28 includes a full-pel prediction step S300, a sub-pel prediction step S301 and a reference direction selecting step S302 for the small blocks.
The full-pel prediction step S300 carries out motion estimation with integer pixel accuracy for the small blocks of M×N using forward prediction and backward prediction (steps S305 and S306). Specifically, motion estimation is performed with integer pixel accuracy within a predetermined search range (e.g., ±32). That is, the motion vectors (hereinafter, referred to as “MV”) 0f and MV0b that result in the smallest coding cost are detected within a predetermined search range.
The sub-pel prediction step S301 carries out motion estimation with non-integer pixel accuracy for the small blocks of M×N using forward prediction, backward prediction and bi-directional prediction (steps S307 to S309). With the inter prediction of AVC, it is possible to perform motion estimation with non-integer pixel accuracy such as ½ pixel accuracy or ¼ pixel accuracy. Accordingly, a reference picture with non-integer pixel accuracy is generated with a filter, and motion estimation is performed for the generated reference picture.
In the forward prediction step S307, MV2f is detected by a two-phase motion vector search. Specifically, taking MV0f, which has been detected in the full-pel prediction step S300, as the center, MV1f (not shown), which results in the smallest coding cost, is determined from 9 points including the surrounding 8 neighboring ½ pixels (or ¼ pixels) and the central MV0f. Furthermore, taking MV1f as the center, MV2f, which results in the smallest coding cost, is determined from 9 points including the surrounding 8 neighboring ½ pixels (or ¼ pixels) and the central MV1f. Further, although it was stated that motion estimation with integer pixel accuracy is carried out in the full-pel prediction, the mode selection method of the present invention can also be applied when pixel culling is performed, for example, when one pixel is culled in the horizontal direction.
In the backward prediction step S308, MV2b is detected from MV0b, which has been detected in the full-pel prediction step S300, as in the forward prediction step S307.
Since the bi-directional prediction step S309 references two reference pictures, it involves a large processing amount. Accordingly, prediction is performed using MV2f and MV2b, which have been detected in the forward prediction step S307 and the backward prediction step S308, respectively. Specifically, the average of the reference areas on reference pictures indicated by MV2f and MV2b is used as a predicted image.
Additionally, the coding costs C0, C1 and C2 are derived in the forward prediction step S307, the backward prediction step S308 and the bi-directional prediction step S309, respectively.
The reference direction selecting step S302 selects, as the reference direction of the small blocks, the direction of the coding cost C0 to C2 that has the smallest coding cost, and outputs the smallest coding cost.
FIG. 29 is a block diagram showing a process flow of motion estimation for an image block. The process flow of motion estimation for an image block that is shown in FIG. 29 includes: a motion estimation step S315 of performing motion estimation for each of small blocks of M×N ((M,N)=(16,16), (16,8), (8,16), (8,8)) obtained by dividing an image block of 16×16 using four types of candidate division methods; a coding cost converting step S316 of deriving the coding cost of the image block for each of the candidate division methods, based on a result of the motion estimation for each of the small blocks; and a division method selecting step S317 of selecting the best division method based on the coding cost of the image block derived for each of the candidate division methods.
The motion estimation step S315 includes small block motion estimation steps S320 to S323, which correspond to the process flow of motion estimation for the small blocks that has been described with reference to FIG. 28. Here, in FIG. 29, the process blocks of the small block motion estimation steps S321 to S323 are connected with a plurality of arrows. For example, the process blocks are connected by two arrows in the small block motion estimation step S321 for 16×8. This indicates that each of the processes is carried out on the two small blocks Sb2 and Sb3, which divide an image block of 16×16 into blocks of 16×8. Similarly, the process blocks are connected by two arrows in the small block motion estimation step S322 for 8×16, and the process blocks are connected by four arrows in the small block motion estimation step S323 for 8×8. The contents of the respective processes of the process blocks are the same as those described with reference to FIG. 28, and therefore the description has been omitted here.
The coding cost converting step S316 includes MB cost converting steps S325 to S328. The MB cost converting steps S325 to S328 sum up the coding costs of the respective small blocks that have been output by the small block motion estimation steps S320 to 323 to derive the coding cost of the image block for each of the candidate division methods.
The division method selecting step S317 selects, from the coding costs of the respective candidate division methods that have been derived by the MB cost converting step S325 to S328, the candidate division method showing the smallest coding cost as the division method applied to the image block.
Furthermore, as shown in FIG. 30, a concept called an image block pair 73, consisting of two image blocks 71 and 72, is adopted in AVC, and it is possible to adaptively switch between field prediction and frame prediction for each image block pair 73. For example, in the case of field prediction, motion estimation is performed for each of the field structure blocks 75 and 76. In the case of frame prediction, motion estimation is performed for each of the frame structure blocks 77 and 78.
Further, there are a total of four types of coding modes of the image block pair 73, namely two types of coding picture structures (field and frame) and two types of coding prediction methods (intra and inter predictions). Conventionally, all of these have been taken into consideration, so that there has been the problem of a large processing amount. The processing load has been particularly larger in the case of intra prediction.
Here, a conventional coding mode determination is described. In the codecs prior to AVC, the concept of a MB pair (large block) does not exist and field and frame exist as the types of a MB (middle block). It has been common to cover four types, namely, intra/inter, and field/frame. As shown in FIG. 31, the coding mode determination is made up of a motion estimation step S81 and a picture structure-and-coding prediction method determining step S82. The estimation step S81 includes first to sixth estimation steps S811 to S816. The first estimation step S811 performs inter prediction on a frame structure block. The second estimation step S812 performs intra prediction on the frame structure block. The third estimation step S813 performs inter prediction on a field structure top MB. The fourth estimation step S814 performs inter prediction on a field structure bottom field. The coding cost derived by the third estimation step S813 and the coding cost derived by the fourth estimation step S814 are summed up, obtaining a coding cost derived from the inter prediction on the field structure block. The fifth estimation step S815 performs intra prediction on the field structure top field. The sixth estimation step S816 performs intra prediction on the field structure bottom field. The coding cost derived by the fifth estimation step S815 and the coding cost derived by the sixth estimation step S816 are summed up, obtaining a coding cost derived from the intra prediction on the field structure block.
The picture structure-and-coding prediction method determining step S82 selects the smallest coding cost from the above-described four types of coding costs.
If the concept of the above-described conventional technology is simply applied to AVC, then a process as shown in FIG. 32 is conceivable. In FIG. 32, the entire process is made up of a motion estimation step S81′, a coding prediction method determining step S83 and a picture structure determining step S82′ for a MB pair.
The motion estimation step S81′ includes first to eighth estimation steps S811′ to S818′. The first estimation step S811′ performs inter prediction on a frame structure top MB 77, and the second estimation step S812′ performs intra prediction on the frame structure top MB 77. The third estimation step S813′ performs inter prediction on a frame structure bottom MB 78, and the fourth estimation step S814′ performs intra prediction on the frame structure bottom MB 78. The fifth estimation step S815′ performs inter prediction on a field structure top MB 75, and the sixth estimation step S816′ performs intra prediction on the field structure top MB 75. The seventh estimation step S817′ performs inter prediction on a field structure bottom MB 76, and the eighth estimation step S818′ performs intra prediction on the field structure bottom MB 76.
The coding prediction method determining step S83 includes first to fourth prediction method determining steps S831 to S834. The first prediction method determining step S831 selects intra/inter for the frame structure top MB 77 by comparing the coding costs of the first estimation step S811′ and the second estimation step S812′. The second prediction method determining step S832 selects intra/inter for the frame structure bottom MB 78 by comparing the coding costs of the third prediction step S813′ and the fourth prediction step S814′. The coding costs of the frame structure top MB 77 and bottom MB 78, for which intra/inter has been selected, are summed up, obtaining the coding cost of the pair of frame structure blocks 77 and 78. The third prediction method determining step S833 selects intra/inter for the field structure top MB 75 by comparing the coding costs of the fifth estimation step S815′ and the sixth estimation step S816′. The fourth prediction method determining step S834 selects intra/inter for the field structure bottom MB 76 by comparing the coding costs of the seventh estimation step S817′ and the eighth estimation step S818′. The coding costs of the field structure top MB 75 and bottom MB 76, for which intra/inter has been selected, are summed up, obtaining the coding cost of the pair of field structure blocks 75 and 76.
The picture structure determining step S82′ determines field/frame for the image block pair 73 (71 and 72) by comparing the coding cost of the pair of frame structure blocks 77 and 78 and the coding cost of the pair of field structure blocks 75 and 76.
Since the above-described process calculates the cost of each of field and frame for both intra prediction and inter prediction, it is possible to determine the coding picture structure and the coding prediction method such that the best compression rate is achieved even in the case of an image whose compression rate is improved only with one of inter prediction and intra prediction. On the other hand, however, intra prediction is performed a large number of times, resulting in an enormous processing amount.