Conventionally, encoding methods using motion compensation, such as MPEG (Moving Picture Experts Group) or H.26x, and orthogonal transformation, such as discrete cosine transformation, Karhunen Loeve transformation, or wavelet transformation, have been generally used as encoding methods in the case of handling a moving image. In these moving image encoding methods, the amount of code is reduced by using a correlation in a space direction and a time direction among the characteristics of an input image signal on which encoding is to be performed.
For example, in H.264, unidirectional prediction or bidirectional prediction is used for generating an inter-frame, which is a frame serving as a target of inter-frame prediction (inter-prediction), using a correlation in a time direction. The inter-frame prediction generates a prediction image on the basis of frames of different times.
FIG. 1 is a diagram illustrating an example of unidirectional prediction.
As illustrated in FIG. 1, in the case of generating a frame to be encoded P0, which is a current-time frame to be encoded, through unidirectional prediction, motion compensation is performed using an encoded frame at a temporally past or future time with respect to the current time as a reference frame. The residual between a prediction image and an actual image is encoded using a correlation in a time direction, whereby the amount of code can be reduced. Reference frame information and a motion vector are used as information specifying a reference frame and information specifying the position to be referred to in the reference frame, respectively, and these pieces of information are transmitted from an encoding side to a decoding side.
Here, the number of reference frames is not necessarily one. For example, in H.264, a plurality of frames can be used as reference frames. When two frames that are temporally close to the frame to be encoded P0 are used as reference frames R0 and R1, as illustrated in FIG. 1, the pixel values of an arbitrary macroblock in the frame to be encoded P0 can be predicted from the pixel values of arbitrary pixels in the reference frame R0 or R1.
The boxes illustrated inside the respective frames in FIG. 1 represent macroblocks. When it is assumed that the macroblock in the frame to be encoded P0, which is a prediction target, is a macroblock MBP0, the macroblock in the reference frame R0 corresponding to the macroblock MBP0 is a macroblock MBR0 that is specified by a motion vector MV0. Also, the macroblock in the reference frame R1 is a macroblock MBR1 that is specified by a motion vector MV1.
When it is assumed that the pixel values of the macroblocks MBR0 and MBR1 (pixel values of motion compensation images) are MC0(i, j) and MC1(i, j), since the pixel values of any of the motion compensation images are used as the pixel values of a prediction image in unidirectional prediction, a prediction image Pred(i, j) is expressed by the following equation (1). (i, j) represents the relative position of a pixel in the macroblock, and 0≦i≦16 and 0≦j≦16 are satisfied. In equation (1), “∥” represents that the value of any of MC0(i, j) and MC1(i, j) is taken.┌Math. 1┐Pred(i,j)=MC0(i,j)∥MC1(i,j)  (1)
Also, it is possible to divide a single macroblock of 16×16 pixels into smaller blocks having a size of 16×8 pixels, for example, and to perform motion compensation on the individual blocks formed through the division by referring to different reference frames. By transmitting a motion vector of decimal precision, not a motion vector of integer precision, and by performing interpolation using an FIR filter defined according to a standard, the pixel values of pixels around the corresponding position that is referred to can be used for motion compensation.
FIG. 2 is a diagram illustrating an example of bidirectional prediction.
As illustrated in FIG. 2, in the case of generating a frame to be encoded B0, which is a current-time frame to be encoded, through bidirectional prediction, motion compensation is performed using encoded frames at temporally past and future times with respect to the current time as reference frames. A plurality of encoded frames are used as reference frames, and the residual between a prediction image and an actual image is encoded using the correlation with those frames, whereby the amount of code can be reduced. In H.264, it is also possible to use a plurality of past frames and a plurality of future frames as reference frames.
As illustrated in FIG. 2, when one past frame and one future frame are used as reference frames L0 and L1, with the frame to be encoded B0 serving as a basis, the pixel values of an arbitrary macroblock in the frame to be encoded B0 can be predicted on the basis of the pixel values of arbitrary pixels of the reference frames L0 and L1.
In the example in FIG. 2, the macroblock in the reference frame L0 corresponding to the macroblock MBB0 in the frame to be encoded B0 is a macroblock MBL0 that is specified by a motion vector MV0. Also, the macroblock in the reference frame L1 corresponding to the macroblock MBB0 in the frame to be encoded B0 is a macroblock MBL1 that is specified by a motion vector MV1.
When it is assumed that the pixel values of the macroblocks MBL0 and MBL1 are MC0(i, j) and MC1(i, j), respectively, the pixel value Pred(i, j) of a prediction image Pred(i, j) can be obtained as the average value of those pixel values, as expressed by the following equation (2).[Math. 2]Pred(i,j)=(MC0(i,j)+MC1(i,j))/2  (2)
In the foregoing motion compensation using unidirectional prediction, the precision of a prediction image is increased by increasing the precision of a motion vector and reducing the size of a macroblock to reduce the residual with respect to an actual image, thereby increasing the encoding efficiency.
Also, in the motion compensation using bidirectional prediction, the averages of the pixel values of pixels of temporally close reference frames are used as the pixel values of pixels of a prediction image, thereby realizing a stable reduction in prediction residual from the viewpoint of probability.
Also, as another method, there is suggested a method for converting a correlation in a time direction into spatial resolution using motion compensation and FIR filtering of pixel values and using it (e.g., see NPL 1).
In the method described in NPL 1, a correlation in a time direction is used for a resolution increase process that it performed on an input image sequence. Specifically, difference information about a difference between a current image and a past image on which motion prediction/compensation has been performed is calculated, and the difference information is fed back to the target current image, thereby recovering a high-frequency component included in input images.