Certain embodiments of the present invention relate to digital video compression and decompression. More specifically, certain embodiments relate to a method and apparatus for motion estimation and compensation in digital video compression and decompression.
Digital video compression schemes, such as MPEG-2 for example, are well known in the art. MPEG-2 uses motion compensated predictive coding to encode a sequence of pictures. This coding entails predicting a two-dimensional block of pixels by translating or interpolating a similar array of pixels from another picture (referred to as the “reference picture”) in the sequence.
Various compression schemes use different sizes of blocks of pixels. For example MPEG-2 uses a 16×16 or 16×8 block of pixels (referred to as a “macroblock”; the terms “block” and “macroblock” may be used interchangeably). Prediction can usually reduce the amount of data that needs to be stored or transmitted, since only the difference between the actual image macroblock and the predicted macroblock need be coded and transmitted. For example, if the predicted macroblock is similar to the actual image macroblock, then the difference between the two macroblocks is very small. Therefore the information content in the difference may be represented in a smaller number of digital bits in comparison to coding and transmitting the original image data. The more accurate the prediction is, the more effective the compression system becomes.
The amount of translation for the reference picture macroblock is indicated by a motion vector, which is encoded as part of the compressed data stream. The motion vector has horizontal and vertical components, indicating the spatial displacement to be applied to a reference macroblock in order to arrive at a predicted macroblock location. However, the displacement may generate a translation that does not coincide with a integer sampling grid position of the picture. The integer sampling grid positions are referred to as the “integer pixel positions” and the positions in between the integer positions are referred to as the “fractional pixel positions”.
The smallest fractional-pixel position in the translation process determines the accuracy of the motion vectors used for prediction. Various known prediction schemes are used in video coding. For example, MPEG-1 and MPEG-2 use ½-pixel accuracy, while MPEG-4 Video Object Plane prediction uses ½-pixel and ¼-pixel accuracy and H.26L (also known as MPEG AVC or JVT or H.264) prediction uses ¼-pixel and ⅛-pixel prediction accuracy. All of these schemes utilize interpolation in at least one step in the prediction process. For example, in MPEG-1 and MPEG-2 for example, averaging adjacent integer-position pixels produces half-pixel position values.
In H.26L prediction, the ¼-pixel positions are created by first performing a 6-tap interpolative filter on the integer-position pixels obtaining the nearest ½-pixel position, then the nearest integer and ½-pixel positions are averaged to obtain the desired ¼-pixel position. When calculating the ⅛-pixel positions in H.26L prediction, the nearest ¼-pixel positions are created using an 8-tap interpolative filter, then the nearest ¼-pixel positions are averaged to get the desired ⅛-pixel position. In some implementations of such codec schemes, the averaging function is combined with the 8-tap, ¼-pixel filtering function into a single 8-tap filter to produce the same result as provided previously.
Performing averaging to obtain pixel positions between two pixel positions results in image distortion and impaired prediction of the image macroblock, thereby reducing the effectiveness of the compression and decompression system. Furthermore, the distinct operations of filtering and averaging result in unnecessarily complex implementations compared to embodiments of the present invention.
It is well known in the art how to design motion estimation and compensation systems for video compression, using the various fractional pixel interpolation techniques described above.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with embodiments of the present invention as set forth in the remainder of the present application with reference to the drawings.
A need exists for an approach to perform efficient video compression and decompression to fractional pixel accuracy with a simply implemented architecture.