Conventionally, encoding schemes using motion compensation such as MPEG (Moving Picture Experts Group) or H.26x and orthogonal transforms such as the discrete cosine transform, Karhunen-Loève transform, or wavelet transform have been generally utilized as encoding schemes in the case of handling moving images. In these moving image encoding schemes, the reduction in amount of code is achieved by utilizing the correlation in the space direction and time direction among the characteristics of an input image signal to be subjected to encoding.
For example, in H.264, unidirectional prediction or bidirectional prediction is used when an inter-frame that is a frame to be subjected to inter-frame prediction (inter-prediction) is generated by utilizing the correlation in the time direction. Inter-frame prediction is designed to generate a prediction image on the basis of frames at different time points.
FIG. 1 is a diagram illustrating an example of unidirectional prediction.
As illustrated in FIG. 1, in a case where a frame to be encoded P0 that is a frame at the current time point, which is to be subjected to encoding, is generated by unidirectional prediction, motion compensation is performed using, as reference frames, already encoded frame at past or future time points in time with respect to the current time point. The residual error between a prediction image and an actual image is encoded by utilizing the correlation in the time direction, thus making it possible to reduce the amount of code. Reference frame information and a motion vector are used, respectively, as information specifying a reference frame and information specifying a position to be referred to in the reference frame, and these pieces of information are transmitted from the encoding side to the decoding side.
Here, the number of reference frames is not limited to one. For example, in H.264, it is possible to use a plurality of frames as reference frames. As illustrated in FIG. 1, in a case where two frames closer in time to the frame to be encoded P0 are denoted by reference frames R0 and R1 in this order, the pixel value of an arbitrary macroblock in the frame to be encoded P0 can be predicted from the pixel value of an arbitrary pixel in the reference frame R0 or R1.
In FIG. 1, a box indicated inside each frame represents a macroblock. If a macroblock in the frame to be encoded P0, which is to be predicted, is represented by a macroblock MBP0, then, the macroblock in the reference frame R0 corresponding to the macroblock MBP0 is a macroblock MBR0 that is specified by a motion vector MV0. Furthermore, the macroblock in the reference frame R1 is a macroblock MBR1 that is specified by a motion vector MV1.
If the pixel values of the macroblocks MBR0 and MBR1 (pixel values of motion compensation images) are represented by MC0(i, j) and MC1(i, j), then, one of the pixel values of the motion compensation images is used as the pixel value of a prediction image in unidirectional prediction. Thus, a prediction image Pred(i, j) is represented by Equation (1) below. (i, j) represents the relative position of a pixel in a macroblock, and satisfies 0≦i≦16 and 0≦j≦16. In Equation (1), “∥” indicates that one of the values MC0(i, j) and MC1(i, j) is taken.
[Math. 1]Pred(i,j)=MCO(i,j)∥MC1(i,j)  (1)
Note that it is also possible to divide a single macroblock of 16×16 pixels into sub-blocks sized by 16×8 pixels or the like and to perform motion compensation on each of the sub-blocks by referring to a different reference frame. Instead of motion vectors with integer accuracy, motion vectors with decimal accuracy are transmitted and interpolation is performed using an FIR filter defined in a standard, thus making it possible to also use the pixel values of pixels around the corresponding position to be referred to for motion compensation.
FIG. 2 is a diagram illustrating an example of bidirectional prediction.
As illustrated in FIG. 2, in a case where a frame to be encoded B0 that is a frame at the current time point, which is to be subjected to encoding, is generated by bidirectional prediction, motion compensation is performed using already encoded frames at past and future time points in time with respect to the current time point, as reference frames. The residual error between a prediction image and an actual image is encoded by using, as reference frames, a plurality of already encoded frames and by utilizing the correlation therewith, thus making it possible to reduce the amount of code. In H.264, it is also possible to use a plurality of past frames and a plurality of future frames as reference frames.
As illustrated in FIG. 2, in a case where a past frame and a future frame with respect to the frame to be encoded B0 are used as reference frames L0 and L1, the pixel value of an arbitrary macroblock in the frame to be encoded B0 can be predicted from the pixel values of arbitrary pixels in the reference frames L0 and L1.
In the example of FIG. 2, the macroblock in the reference frame L0 corresponding to a macroblock MBB0 in the frame to be encoded B0 is set as a macroblock MBL0 that is specified by a motion vector MV0. Furthermore, the macroblock in the reference frame L1 corresponding to the macroblock MBB0 in the frame to be encoded B0 is set as a macroblock MBL1 that is specified by a motion vector MV1.
If the pixel values of the macroblocks MBL0 and MBL1 are represented by MC0(i, j) and MC1(i, j), respectively, then, the pixel value Pred(i, j) of a prediction image Pred(i, j) can be determined as the average value of these pixel values, as given in Equation (2) as follows.
[Math. 2]Pred(i,j)=(MC0(i,j)+MC1(i,j))/2  (2)
In such motion compensation as above using unidirectional prediction, the accuracy of a prediction image is improved by increasing the accuracy of a motion vector or by reducing the size of a macroblock, and the residual error from the actual image is reduced, whereby the improvement in encoding efficiency is achieved.
Furthermore, in motion compensation using bidirectional prediction, the average of the pixel values of pixels of reference frames located close in time is used as the pixel value of a pixel of a prediction image, thus making feasible probabilistically stable reduction in prediction residual error.