In recent years, in order to store and transmit moving image information as digital data, a technology that encodes image information at a high compression ratio and with high picture quality is sought. As a compression technique for image information, a compression-encoding method in which an orthogonal transform such as a discrete cosine transform and motion prediction/motion compensation are combined has come to be widely diffused, as typified by the MPEG method.
In addition, in recent years, as an encoding method that aims for an even higher compression ratio and even better picture quality, there is ITU-T H.264/AVC(MPEG4 AVC). It is known that, compared to conventional encoding methods such as MPEG2 and MPEG4, H.264, despite requiring more calculations for decoding, achieves higher encoding efficiency (refer, for example, to ISO/IEC FCD 14496-10:2004 (MPEG-4 Part 10) ‘Advanced Video Coding’).
FIG. 13 shows an example of a basic configuration of an image encoding apparatus that compression-encodes image information using the H.264 method. The image encoding apparatus shown in FIG. 13 comprises a frame rearrangement unit 1300, an intraframe predictor 1301, an orthogonal transform unit 1302, a quantizer 1303, an encoder 1304, an inverse quantizer 1305, and an inverse orthogonal transform unit 1306, and further comprises an incrementer 1307, a loop filter processor 1308, a frame memory 1309, a motion predictor 1310, a motion compensator 1311 and a decrementer 1312.
A description is now given of steps in the encoding process of the image encoding apparatus shown in FIG. 13.
A digitized moving image signal (moving image data) is input to the frame rearrangement unit 1300. At the frame rearrangement unit 1300, the frames (sometimes referred to as pictures) are rearranged in the order in which they are to be encoded. As for the image frame type, there is an I picture encoded from information within the same frame, a P picture encoded using the difference with a chronologically earlier frame, and a B picture, which can also utilize the difference with a chronologically later (locally-decoded) frame. The B picture refers to a chronologically later frame, and therefore its place in the order of encoding comes after that of a reference frame.
The image frames rearranged into encoding order are then encoded in units of macro blocks, which are the smallest spaces having a predetermined size. At the intraframe predictor 1301, the I picture (that is, the macro block that performs intraframe prediction) predicts the picture elements in the macro block from the picture element information within the frame and outputs data on the difference between the predicted pixels and the actual pixels (the current pixels) to the orthogonal transform unit 1302.
If the input image is a B picture or a P picture (that is, a macro block that performs interframe prediction), then data on the difference between an interframe prediction predicted image, to be described later, and the current image is output to the orthogonal transform unit 1302.
At the orthogonal transform unit 1302, a 4×4 pixel integer transform (orthogonal transform) is performed, and the input differential data is converted into frequency components and given to the quantizer 1303. At the quantizer 1303, the frequency component data is quantized. The image data quantized by the quantizer 1303 is then output to the encoder 1304 and the inverse quantizer 1305.
At the encoder 1304, the quantized data is variable-length encoded or incrementally encoded and output as an encoded bit stream. By contrast, at the inverse quantizer 1305, the image data quantized by the quantizer is dequantized, decoded into frequency components, and then further decoded into a predicted error image by an inverse orthogonal transform performed by the inverse orthogonal transform unit 1306.
If the image output from the inverse orthogonal transform unit 1306 is a P picture or a B picture predicted error image, then the image is decoded into a frame by the incrementer 1307 adding a motion-compensated image from the motion compensator 1311 to the predicted error image.
The locally-decoded image is subjected to a filtering process by the loop filter 1308 that eliminates block distortion and is then stored in the frame memory 1309. In the MPEG2 encoding method, the I and P pictures are always used as reference image frames for the purpose of motion detection, and the B picture cannot be used as a reference frame. As a result, it can be determined whether or not it is necessary to store the target image frame in the frame memory 1309 according to the type of picture. By contrast, in the H.264 method, there are instances in which the P picture is not used as the reference frame, even though it is a P picture. In addition, there are also instances in which the B picture is not used as the reference frame, despite being a B picture. Moreover, it is possible to store an arbitrary number of locally-decoded frames in the frame memory 1309 for use as reference frames.
The motion predictor 1310, when it performs interframe prediction, searches the decoded images stored in the frame memory 1309 for the image with the smallest difference from the input image and calculates and outputs the motion vector that is the motion information of a frame to be encoded of the input image. In addition, the motion predictor 1310 also outputs reference direction information as to whether the reference image is ahead of or behind the input image, and whether it is the immediately preceding or the immediately succeeding image. The motion compensator 1311 performs the calculations indicated by the motion vector and the reference direction information and outputs a motion-compensated image (a predicted image). The decrementer 1312 takes the difference between the input image and the motion-compensated image and outputs a differential image (a predicted error image) to the orthogonal transform unit 1302.
It should be noted that whether data is output to the orthogonal transform unit 1302 from the intraframe predictor 1301 or from the decrementer 1312 is switched as convenient according to the encoding mode.
As described above, in the H.264 method it is possible to use any frame as the reference frame for motion detection regardless of the picture type. This type of encoding method enables selection of the reference image from among a greater number of candidate images than an encoding method that performs motion prediction referencing only a reference frame of a particular picture type, and consequently it is possible to perform high-accuracy motion prediction. However, in order to carry out efficient encoding, it is necessary to select as a reference candidate image an image frame that is as well-suited as possible for reference when performing motion prediction. If the best reference image can be selected, the data for the difference with the input image will shrink and the generated encoding volume can be reduced.
In addition, with an encoding method like the H.264 method, which stores a plurality of image in the frame memory and can use all of them as reference images, there is the problem that the volume of calculations that must be performed increases dramatically when searching for the motion vector from among all the images in the frame memory.