H.264/AVC (hereinafter referred to as H.264) is known as the latest standard of highly efficient motion picture compression technique. H.264 is an international standard of motion picture encoding developed by JVT that is jointly established in December, 2001, by a video encoding expert group (VCEG) of ITU-T and a motion picture encoding expert group (MPEG) of ISO/IEC. ITU-T has given approval to H.264 as a standard in May, 2003. ISO/IEC JTC has standardized H.264 as MPEG-4 Part 10 Advanced Video Coding (AVC) in 2003. In addition, expanding work of the color space and the pixel gradation has been done, and thereupon the draft of final standard has been generated in July, 2004, as Fidelity Range Extension (FRExt).
The main features of H.264 are as follows.                As compared with the conventional MPEG-2 and MPEG-4 methods, H.264 achieves almost the same quality of image with the encoding efficiency almost twice as high as the conventional MPEG-2 and MPEG-4 methods.        Compression algorithm: H.264 employs inter-picture prediction, quantization, entropy coding        H.264 can be widely used for any application ranging from a low bit rate, e.g., a cellular phone, to a high bit rate, e.g., an HD television.        
As described above, various kinds of discussions have been made about motion picture encoding techniques, and currently, standardization work of H.265 is now being done as a next-generation standard.
Hereinafter, the technical contents of H.264 will be explained.
FIG. 1 is a general configuration diagram illustrating an encoder 1 that performs encoding using H.264. In H.264, an intra prediction (104) for generating a prediction image within a screen and an inter prediction (105) for generating a prediction image from multiple screens are defined. Any one of the intra prediction and the inter prediction is selected based on predetermined criteria, a difference between the selected prediction image and an input image (101) is obtained, subsequent orthogonal transformation (102) and quantization (103) processing is done on the difference data, and encoding processing is performed on the quantized data (110).
It should be noted that the reconfigured image (106) used as a reference image in the inter prediction processing (105) is generated by applying inverse-quantization (109) and inverse-orthogonal transformation (108) processing, which are processing opposite to the orthogonal transformation and quantization processing, on the quantized data (103) (or entropy encoded data (110)) and thereafter applying deblocking filter processing (107) for alleviating block noise on the data added with the prediction image. In some cases, the deblocking filter may not be used. In H.264, only the difference image is encoded and transmitted, whereby high encoding efficiency is achieved.
Inter prediction processing and intra prediction processing performed by the encoder 1 as prediction processing will be hereinafter explained.
FIG. 2 is a figure for explaining the inter prediction. As illustrated in FIG. 2, the inter prediction for generating a prediction image from multiple screens is processing for generating the prediction image by calculating, with regard to a prediction target block (201) of an input image of a prediction target, a motion vector of the prediction target block from reference blocks (200/202) of pictures before and after the picture in question (time t=−1, t=1).
FIG. 3 is a figure for explaining the intra prediction. The intra prediction is a method for generating a prediction image using correlation between pixels in proximity. In the intra prediction processing of H.264, a screen is divided into processing blocks called slices (300) having m by n pixels, and further, a slice is divided into macro blocks (301) having 16 by 16 pixels. In the intra prediction processing, this macro block 301 is adopted as a basic processing unit, and further, a prediction image is generated for every 4 by 4, 8 by 8, and 16 by 16 pixel block of the macro blocks 301.
FIG. 4 illustrates the order of processing of the pixel blocks in each of cases of 4 by 4, 8 by 8, and 16 by 16. More specifically, the processing is performed in ascending order of the number indicated within each pixel block.
In the intra prediction processing, a prediction image is generated by referring to pixels at the left, the upper left, the top, or the upper right of the prediction target block. This is because, in order to let the decoding side to generate the prediction image in the same manner, an encoded pixel (i.e., for the decoding side, this is a pixel that has been decoded and image-reconfigured) is required to be adopted as a reference pixel. For example, FIG. 5 shows reference pixels used to generate a prediction image of the intra prediction in units of 4 by 4 pixel units.
For sixteen prediction target pixels 401 (a to p) included in the prediction target block of FIG. 5, symbols A to M denote reference pixels 400 used for the prediction. In the H.264/AVC, the prediction image can be generated in 4 by 4 pixel block unit (hereinafter referred to as 4 by 4 block), 8 by 8 pixel block unit (hereinafter referred to as 8 by 8 block), and 16 by 16 pixel block unit (hereinafter referred to as 16 by 16 block). At this occasion, a processing method defining processing for generating a prediction target block from reference pixels is called a mode. Available modes for each of the 4 by 4 block and the 8 by 8 block include nine modes, and available modes for the 16 by 16 block include four modes. That is, 22 modes are available in total.
FIG. 6 illustrates, for example, nine modes that can be used for the 4 by 4 block.
As described above, H.264, which is the latest motion picture encoding standard, makes use of various kinds of methods in order to achieve a compression technique with a high degree of efficiency.
FIG. 7 is a flowchart illustrating encoding processing that is performed by the encoder 1. In this case, a case where the intra prediction processing is used as the prediction processing will be hereinafter explained. In the encoding processing, when an input image is received, a prediction image of a prediction target block is generated from a reconfigured image for all the prediction modes, and differences from the input image are calculated, and then a mode of which sum of absolute differences (hereinafter referred to as SAD) is the least is determined as an optimum prediction mode for the intra prediction (601). The SAD is, for example, defined by the following expression. It should be noted that Σi means that all the pixels within the prediction target block are adopted as target.
                              [                      Math            ⁢                                                  ⁢            1                    ]                ⁢                                  ⁢                                  ⁢                                                                              SAD                  =                                                            ∑                      i                                        ⁢                                                                                  ⁢                                                                                                                  Intput                          ⁡                                                      (                            i                            )                                                                          -                                                  Pred                          ⁡                                                      (                            i                            )                                                                                                                                                                    ,                                  …                  ⁢                                                                          ⁢                                                                                                                                                                                      Input                            ⁡                                                          (                              i                              )                                                                                ⁢                                                      :                                                    ⁢                                                                                                          ⁢                          input                          ⁢                                                                                                          ⁢                          pixel                                                ,                                                                              Pred                            ⁡                                                          (                              i                              )                                                                                ⁢                                                      :                                                    ⁢                                                                                                          ⁢                          prediction                          ⁢                                                                                                          ⁢                                                      pixel                            .                                                                                                                                                                                                  expression                ⁢                                                                  ⁢                1                                                                                    
The SAD is generally used as an evaluation standard, but it may also be possible to use evaluation values other than the above. The method for determining a prediction mode is not explicitly described in the standard, and the method for determining a prediction mode depends on the design.
Subsequently, a prediction image is generated from a reconfigured image (608) with the prediction mode thus determined (602), and a difference image from the input image is obtained. Thereafter, the reconfigured image (608) is obtained through an orthogonal transformation (603), a quantization (604), an inverse-quantization (605), and an inverse-orthogonal transformation (606). Thereafter, using methods such as context-adaptive variable-length coding (CAVLC)/context-based adaptive binary arithmetic coding (CABAC), entropy encoding is performed (607), and a stream is generated. At this occasion, various kinds of processing are required to obtain the reconfigured image (608), and it takes some processing time.
In order to simply increase the speed of the encoding processing, generally used is a method of parallel processing with two or more pipeline configuration in parallel where processing from the determination of the prediction mode (601) to the encoding processing (607) is made into MB units. However, when parallel processing is carried out, when a prediction image is generated from a reconfigured image as described above and a mode of the least prediction error (evaluation value) is determined in the determination (601) of the prediction mode of the intra prediction processing, it takes some processing time to obtain the reconfigured image, which makes it impossible to increase the speed.
A method suggested as means for reducing the time to obtain a reconfigured image (reference image) includes a method for determining a prediction mode by generating a quasi prediction image using an input image instead of generating a prediction image using a reconfigured image when determining a prediction mode (601) (for example, see Patent Document 1).
In normal circumstances, there occurs not a little difference, due to quantization error, between a prediction image generated from an input image and a prediction image generated from a reconfigured image. Therefore, the prediction may not necessarily be determined in an optimum manner. Therefore, in particular, with a low bit rate, prediction error is accumulated due to failure to determine a mode, which significantly reduces the quality of the image.
With the image encoding apparatus of Patent Document 1, quasi-intra prediction is done using an original image, and thereafter, Hadamard transform, quantization, and inverse calculation thereof are applied to the prediction error, and the image encoding apparatus of Patent Document 1 uses a mode selection method for alleviating error propagation by checking the degree of degradation due to the quantization.