Recently, the H.264 coding method is attracting attention as a new motion image coding method. This coding method is developed by the cooperation of ITU-T and ISO. This new standard was standardized in the summer of 2003.
The characteristic features of this new coding method are that 4×4 integer transformation is used, and a plurality of intra-predictions are prepared, unlike in the conventional MPEG-1, MPEG-2, and MPEG-4 coding methods. In addition, an intra-loop filter is used, and motion compensation is performed by seven types of sub-blocks. Also, the pixel accuracy of the motion compensation is the same as the MPEG-4 coding method, i.e., the motion compensation can be performed by ¼ pixel accuracy. Furthermore, universal variable-length coding or context adaptive variable-length coding is used as entropy coding.
The more important characteristic feature is as follows. That is, the MPEG-1, MPEG-2, and MPEG-4 coding methods perform motion compensation by using two reference images (frames) before and after a frame to be coded. However, this new coding method can use a larger number of reference images. A num_ref_frames code contained in the header of a bit stream can take a maximum of 16 values.
More specifically, in motion compensation, 16 frames before and after a frame to be coded can be referred to as reference images. A macroblock to be coded is processed as follows. As described above, a prediction error is calculated by ¼ pixel accuracy for seven types of sub-blocks with respect to an image having a maximum of 16 frames, and a macroblock by which this prediction error is a minimum is selected. This largely increases the coding efficiency.
The arrangement of the conventional motion image coding apparatus using the H.264 coding method will be explained with reference to FIG. 13. This arrangement is also explained in reference 1 (“Overview of the H.264/AVC Video Coding Standard” (IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, JULY 2003)).
FIG. 13 is a block diagram showing the arrangement of the conventional motion image coding apparatus.
Image data is input macroblock by macroblock to this motion image coding apparatus. A selector 1000 switches whether to perform intra-coding. If intra-coding is to be performed, the image data is input to an intra-predictor 1001. The intra-predictor 1001 performs prediction in nine modes, and calculates a prediction error.
If coding to be performed is not intra-coding, the image data is input to a differential unit 1002 where the difference from a predicted image is calculated as a prediction error.
A transformer/quantizer 1003 transforms the calculated prediction error into an integer of 4×4 pixel blocks, and quantizes the obtained coefficient. This quantized efficient as the result of quantization undergoes variable-length coding performed by an entropy coder 1004, and is output to an output unit 1014. At the same time, the quantization result is input to a dequantizer/invert transformer 1005 to restore the prediction error, and this prediction error is added to the predicted image by an adder 1006. The result is suitably stored as a decoded image in frame memories 1007 to 1010.
A motion estimator 1011 compares the decoded image stored in the frame memories 1007 to 1010 with the input image, and calculates a motion vector by ¼ pixel accuracy for each sub-block. These motion vectors and the selected frame numbers are input to a motion compensator 1012, and reference images are loaded from the corresponding frame memories. A reference image having a minimum prediction error is selected and output as a predicted image to the differential unit 1002.
The motion vectors and selected frame numbers are also input to a motion coder 1013 and coded, and the coded data is output to the output unit 1014. The output unit 1014 shapes this coded data in accordance with a format, and outputs the shaped data.
Unfortunately, a coding method which refers to a plurality of frames such as the H.264 coding method described above poses the problem that motion vectors are searched for in order to execute motion compensation, and the calculation amount becomes enormous as the number of reference images increases.
In addition, especially when the H.264 coding method is used in an image sensing apparatus such as a video camera, the entire image largely changes whenever the video camera is panned or tilted. Therefore, although the frequency of reference of data of temporally separated frame images decreases, this data must be held.