Conventionally, as techniques for compression-encoding moving images, MPEG, H.26x, etc. using orthogonal transform, such as discrete cosine transform, and motion compensation are known.
MPEG2 is defined as a general-purpose image encoding method. MPEG2 is now widely used for a wide range of applications for professional use and for consumer use, as a technique that can support both the interlaced scanning images and progressive scanning images and that support both the standard-definition images and high-definition images and also standard specifications.
By the use of MPEG2, an amount of encoding (bit rate) of 4 to 8 Mbps are applied to, for example, interlaced scanning images having standard definition of 720×480 pixels, and an amount of encoding (bit rate) of 18 to 22 Mbps are applied to, for example, interlaced scanning images having high definition of 1920×1088 pixels, thereby making it possible to realize a high compression ratio and good image quality.
On the other hand, H.26x was initially developed as an image encoding technique for videoconferencing. For example, H26L requires a larger amount of computation for performing encoding and decoding than MPEG2 or MPEG4. However, H.26L is known to achieve a higher coding efficiency.
Also, as a part of the activities in MPEG4, standardization of a technique for realizing a higher coding efficiency is performed based on H.26L and also by incorporating functions that are not supported by H.26L. Such a technique is standardized as H.264 or MPEG-4 Part10 (Advanced Video Coding). This standardization is hereinafter referred to as the “AVC standards”.
FIG. 1 illustrates an example of the configuration of an image encoding apparatus that performs compression-encoding on input moving images on the basis of the AVC standards and outputs image compression information (encoded signal) obtained as a result of compression-encoding.
This image encoding apparatus 10 includes an analog-to-digital converter (A/D) 11, an image rearrangement buffer 12, an adder 13, an orthogonal transformer 14, a quantization unit 15, a lossless encoder 16, a storage buffer 17, and a rate controller 26. The image encoding apparatus 10 further includes an inverse quantization unit 18, an inverse orthogonal transformer 19, an adder 20, a deblocking filter 21, a frame memory 22, an intra-prediction unit 23, and a motion-prediction/compensation unit 24.
In the image encoding apparatus 10, a moving image input as an encoding target (hereinafter referred to as an “input image”) is converted into a digital signal by the A/D 11, and is input into the image rearrangement buffer 12. In the image rearrangement buffer 12, the order of pictures is rearranged in accordance with a GOP (Group of Pictures) structure used when the image is output, and the resulting image is supplied to the subsequent block.
If the image output from the image rearrangement buffer 12 is subjected to intra-coding, the encoding target image, which is output from the image rearrangement buffer 12, is supplied to the intra-prediction unit 23. In the intra-prediction unit 23, a prediction image is generated. Then, the generated prediction image and the encoding target image are supplied to the adder 13. A difference signal between the prediction image and the encoding target image is calculated, and is supplied to the orthogonal transformer 14.
In the orthogonal transformer 14, the output from the adder 13 is subjected to orthogonal transform (discrete cosine transform, Karhunen-Loeve transform, or the like), and a transform coefficient obtained as a result of orthogonal transform is quantized by the quantization unit 15. Note that the quantization rate used in the quantization unit 15 is controlled by the rate controller 26 in accordance with the storage capacity of the storage buffer 17. The quantized transform coefficient is supplied to the lossless encoder 16 and the inverse quantization unit 18.
In the lossless encoder 16, the quantized transform coefficient is subjected to lossless encoding (variable length coding, arithmetic coding, or the like), and the result is stored in the storage buffer 17 and is then output to the subsequent block as image compression information.
Meanwhile, in the inverse quantization unit 18, the quantized transform coefficient is subjected to inverse quantization, which corresponds to the quantization performed by the quantization unit 15, and is output to the inverse orthogonal transformer 19. In the inverse orthogonal transformer 19, inverse orthogonal transform, which corresponds to the orthogonal transform performed by the orthogonal transformer 14, is performed on the transform coefficient obtained as a result of inverse quantization. The result is then output to the adder 20.
In the adder 20, the inverse orthogonal transform result and the encoding target image are added so that a decoded image, which is an image obtained by encoding the encoding target image and by decoding it, is generated. The deblocking filter 21 removes blocking distortions from the generated decoded image, and then, the resulting image is stored in the frame memory 22.
In the intra-prediction unit 23, a prediction image corresponding to the encoding target image is generated, and also, information indicating an intra-prediction mode applied to each macroblock of the encoding target image is output to the lossless encoder 16. This information indicating the intra-prediction mode is encoded by the lossless encoder 16 as part of information described in the header of image compression information.
Note that in the case of H.264, as the intra-prediction modes, an intra 4×4 prediction mode, an intra 8×8 prediction mode, and an intra 16×16 prediction mode are defined for luminance signals. For color-difference signals, a prediction mode, which is independent of the prediction modes for the luminance signals, can be defined for each macroblock. For example, concerning the intra 4×4 prediction mode, one intra-prediction mode is defined for each 4×4 luminance block. Concerning the intra 8×8 prediction mode, one intra-prediction mode is defined for each 8×8 luminance block. Concerning the intra 16×16 prediction mode, one intra-prediction mode is defined for each macroblock. Also, for color difference signals, one prediction mode is defined for each macroblock.
If the image output from the image rearrangement buffer 12 is subjected to inter-coding, the encoding target image is input into the motion-prediction/compensation unit 24. At the same time, a decoded image output from the frame memory 22 is read out to the motion-prediction/compensation unit 24 as a reference image. Then, motion-prediction/compensation is performed on the encoding target image and the reference image, and a prediction image obtained as a result of motion-prediction/compensation is supplied to the adder 13. In the adder 13, the prediction image is converted into a difference signal between the encoding target image and the prediction image, and the difference signal is output to the orthogonal transformer 14. Operations performed by the blocks subsequent to the orthogonal transformer 14 are similar to those for intra-coding, and thus, an explanation thereof is omitted.
In the motion-prediction/compensation unit 24, simultaneously with the generation of the above-described prediction image, a motion vector of each macroblock is detected and is output to the lossless encoder 16. This motion vector is encoded by the lossless encoder 16 as part of information described in the header of image compression information.
Here, motion compensation performed in the motion-prediction/compensation unit 24 is described. Motion compensation is processing performed by assigning a portion of a decoded image stored in the frame memory 22 to a portion of an encoding target image. A motion vector detected by the motion-prediction/compensation unit 24 determines which portion of the decoded image is used for reference.
In order to improve the prediction precision, the motion vector is calculated with a precision using fractions, which are smaller than integers, such as ½ Pel, ¼ Pel, and so on. In this manner, in order to perform motion compensation with a fraction precision, it is necessary to newly set pixels between actual pixels of an image, i.e., at a position in which pixels do not exist, by interpolation processing.
An example of the case where the number of pixels is increased by interpolation is described below with reference to FIG. 2. FIG. 2 illustrates an example of the case where the number of pixels is increased in each of the vertical direction and in the horizontal direction to four times of the original number of pixels. In FIG. 2, the white circles represent the positions of actual pixels, and the white squares represent the positions of interpolation pixels.
Each interpolation pixel is interpolation-calculated by linear combination of a plurality of actual pixels, calculated interpolation pixels, and a predetermined filter coefficient, as expressed by, for example, the following interpolation equations.b=(E−5F+20G+20H−5I+J)/32h=(A−5C+20G+20M−5R+T)/32j=(aa−5bb+20b+20s−5gg+hh)/32a=(G+b)/2d=(G+h)/2f=(b+j)/2r=(m+s)/2
Interpolation pixels aa, bb, s, gg, and hh are calculated by equations similar to the above-described equation for calculating the interpolation pixel b. Interpolation pixels cc, dd, m, ee, and ff are calculated by equations similar to the above-described equation for calculating the interpolation pixel h. Interpolation pixel c is calculated by an equation similar to the above-described equation for calculating the interpolation pixel a. Interpolation pixels i, k, and q are calculated by equations similar to the above-described equation for calculating the interpolation pixel d. Interpolation pixels e, g, and o are calculated by equations similar to the above-described equation for calculating the interpolation pixel r.
The above-described interpolation equations are employed in, for example, H.264 and AVC standards. These interpolation equations are realized by a FIR (Finite Impulse Response) filter having an even-numbered tap.
The motion-prediction/compensation unit 24 contains, instead of a FIR filter, an AIF (Adaptive Interpolation Filter) 25 that can adaptively change a filter coefficient in an interpolation equation for every frame. The interpolation processing is performed by the use of this AIF 25 so that aliasing influences or coding distortions are reduced, thereby decreasing motion compensation errors. The filter coefficients that are adaptively changed by the AIF 25 are output, together with motion vectors, to the lossless encoder 16. The filter coefficients are encoded and output as image compression information.
The AIF is disclosed in, for example, Non-Patent Documents 1 and 2.