1. Field of the Invention
Methods and apparatuses consistent with the present invention relate to encoding and decoding an image, and more particularly, to a method and apparatus for encoding and decoding an image by performing inter prediction using a plurality of reference pictures that are most similar to a current picture.
2. Description of the Related Art
In image compression methods such as Moving Picture Experts Group-1 (MPEG-1), MPEG-2, MPEG-4, and H.264/MPEG-4 Advanced Video Coding (AVC), a picture is divided into a predetermined image processing unit, for example, a block having a predetermined size. Then, each of the blocks is encoded using inter prediction or intra prediction. An optimum encoding mode is selected in consideration of a data size and a data distortion of the blocks, and the blocks are encoded according to the selected optimum encoding mode.
Here, inter prediction compresses an image after eliminating temporal redundancy between pictures. An example of inter prediction includes motion prediction encoding, which uses at least one reference picture to predict motion of a current picture to obtain a block unit, and predicts each block based on the result of motion prediction.
In order to predict the current block, in motion prediction encoding, a block that is most similar to the current block is searched for in a determined searching range of the reference picture. When the similar block is found, only residual data existing between the current block and the block that is similar to the current block in the reference picture is encoded and transmitted, thereby increasing a compression rate of data. This will be described in more detail with reference to FIG. 1.
FIG. 1 illustrates a conventional method of predicting blocks 112, 114, and 116 of a current picture 110 using a plurality of reference pictures 120, 130, and 140.
Referring to FIG. 1, the plurality of reference pictures 120, 130, and 140 are referred to predict the blocks 112, 114, and 116 included in the current picture 110 (P(n)). The reference picture 120 (P(n−1)) is located directly before the current picture 110 and is temporally the nearest to the current picture 110. The time gaps between the current picture 110 and the reference pictures 130 (P(n−2)) and 140 (P(n−3)) are greater than the time gap between the current picture 110 and the reference picture 120 (P(n−1)), and the time gap between the current picture 110 and the reference picture 140 (P(n−3)) is greater than the time gap between the current picture 110 and the reference picture 130 (P(n−2)).
Since the plurality of reference pictures 120, 130, and 140 are searched for prediction encoding the blocks 112, 114, and 116 included in the current picture 110, reference blocks 122, 132, and 142 may be used to predict the blocks 112 114, and 116 of the current picture 110 that may respectively exist in the plurality of reference pictures 120, 130, and 140.
In FIG. 1, prediction is performed with reference to the plurality of reference pictures 120, 130, and 140 that temporally precede the current picture 110. However, when the current picture 110 is a Bi-directional predictive picture (a B picture), pictures that temporally follow the current picture 110 can be also used in the prediction of the current picture 110, in addition to the plurality of reference pictures 120, 130, and 140 that temporally precede the current picture 110.
The blocks 112, 114, 116 included in the current picture 110 are predicted and residual blocks thereof are generated. Then, the residual blocks, a motion vector, and a reference picture index of each of the blocks 112, 114, 116 are encoded, thereby encoding the blocks 112, 114, 116 included in the current picture P(n) 110. Here, the reference picture index is information for specifying which reference picture is used from among a plurality of reference pictures in inter prediction.
According to a conventional art, the encoded motion vector includes relative location differences between the blocks 112, 114, 116 included in the current picture 110 and the reference pictures 122, 132, and 142; in other words, information about motion of the blocks on a two-dimensional (2D) plane. Since the motion vector only reflects movement with respect to an x-axis and y-axis on a 2D plane, that is, a translational transform, various transforms such as an expansion/reduction and a rotation of an image object existing between the current picture 110 and the plurality of reference pictures 120, 130, and 140 cannot fully be reflected.
In addition, in encoding or decoding according to conventional prediction encoding and decoding using various transforms of an image, a large amount of bits is used to encode information about various transforms, such as the expansion/reduction and the rotation, and thus a compression ratio for image encoding decreases.
Therefore, a method and apparatus capable of efficiently prediction encoding the current picture 110 by reflecting various transforms of an image are needed.