In recent years the H.264 encoding method, as a new moving image encoding method, has received a lot of attention. This encoding method is jointly developed by the ITU-T and ISO.
As a feature of this new encoding method, 4×4 integer transformation is used, and a plurality of intra-prediction schemes are prepared, unlike in the conventional MPEG-1, 2, and 4 encoding methods. An intra-loop filter is used, and motion compensation is made using seven different types of subblocks. The pixel precision of the motion compensation is as high as ¼ pixel precision as in the MPEG-4 encoding method. Furthermore, universal variable-length encoding and context adaptive variable-length encoding are used as entropy encoding.
As a more significant feature, MPEG-1, 2, or 4 performs motion compensation using two reference images before and after the current image, but H.264 can use more reference images. A num_ref_frames code included in the first header of a bitstream can assume a maximum of 16 values. That is, 16 frames before and after the current frame can be referred to. As for a macroblock to be encoded, prediction errors of seven different types of subblocks are calculated for images of a maximum of 16 frames, as described above, and a macroblock that can minimize the prediction errors is selected, thus greatly improving the encoding efficiency.
FIG. 11 shows the arrangement of an H.264 encoder. This encoder receives image data for respective macroblocks.
A switch 1000 switches whether or not intra-encoding is performed. In case of intra-encoding, image data is input into an intra-predictor 1001, which predicts in nine modes in order to calculate prediction errors. In encoding other than intra-encoding, image data is input into a differentiator 1002, which calculates differences from predicted images to generate prediction errors.
A transformer/quantizer 1003 computes the integer transformation of the obtained prediction errors for 4×4 pixel blocks, thus quantizing respective coefficients. The quantization result undergoes variable-length encoding by an entropy encoder 1004, and the encoded result is output to an output unit 1014. At the same time, the quantization result is input to an inverse quantizer/inverse transformer 1005, which reconstructs the prediction errors. The prediction errors are added to prediction images by an adder 1006. The results are stored in frame memories 1007 to 1010 accordingly.
A motion estimator 1011 compares decoded images stored in the frame memories 1007 to 1010 with an input image to calculate motion vectors for respective subblocks with ¼ pixel precision. The motion vectors and the numbers of the selected frames are also input to a motion compensator 1012, which loads reference images from the corresponding frame memories, selects a reference image with minimum prediction errors, and outputs it as a predicted image to the differentiator 1002. The motion vectors and the numbers of the selected frames are input to a motion encoder 1013 and are encoded, so that the motion vectors are encoded to mvd_idx_IO codes and the like, and information of the reference frames is encoded to ref_idx_IO codes and the like. The encoded codes are output to the output unit 1014. The output unit 1014 shapes and outputs encoded data in accordance with the format.
FIG. 22 shows the arrangement of an H.264 decoder. When encoded data is input into an input unit 5102, codes are interpreted and are distributed to corresponding decoders. An entropy decoder 51021 performs variable-length decoding to obtain the quantization result of 4×4 transformation coefficients. The quantization result is input to an inverse quantizer/inverse transformer 51022 to reconstruct prediction errors. In case of intra-encoding, image data is input to an intra-predictor 51023 to perform prediction from surrounding pixels, thus reconstructing and outputting pixel data. In case of encoding other than intra-encoding, an adder 51024 adds the prediction errors to predicted images to reconstruct and output pixel data. At the same time, these pixel data are stored in the frame memories 51025 to 51028 accordingly. A motion decoder 51029 decodes mvd_idx_IO codes and the like representing motion vectors and ref_idx_IO codes and the like representing reference frame information, and inputs decoded data to a motion compensator 51030. The motion compensator 51030 loads reference images from the corresponding frame memories and outputs them as predicted images to the adder 51024.
Since a plurality of frames are referred to in this way, the encoding efficiency can be improved by referring to temporally separated frames such as in a case wherein an object which once hides behind another object and appears after a brief interval.
The aforementioned encoding apparatus encodes by referring to a plurality of frames, and tries to reduce prediction errors, with reference to many frames. However, when the number of frames to be referred to becomes large, a more robust process is required in motion vector search for motion compensation.
In the aforementioned encoding apparatus, when there are no significant changes, all reference frames are very similar to one another, and images over a long period of time cannot be referred to by fewer frames. However, when a change has occurred, a large code size is generated. For example, a railway monitor monitors rail tracks without trains most of the time, and generates a large code size at an instance when a train is framed. This poses a problem. It lowers the effect of improving the encoding efficiency even when a plurality of frames are referenced. The number of frames increases dramatically over a long period of time, the circuit scale becomes huge, and many additional processes are required, resulting in long processing times.
Also, when images including many noise components are selected as reference images upon encoding, a large code size is required.
Since intra-frame encoded images generally deteriorate less than inter-frame encoded images, they are suited to reference frames, but have a large code size.