Generally, motion estimation in video compression is a computationally intensive task. Given a reference frame and, for example, a macroblock comprising (M×N) pixels in a current frame, the objective of motion estimation is to find an (M×N) pixel block called the best matching block (BMB) in the reference frame that matches the characteristics of the macroblock in the current picture according to some criterion. This criterion can be, for example, a sum of absolute differences (SAD) between the pixels of the macroblock in the current frame and the block of pixels in the reference frame with which it is compared. This process is known generally as ‘block matching’. It should be noted that, in general, the geometry of the block to be matched and that in the reference frame do not have to be the same, as real-world objects can undergo scale changes, as well as rotation and warping. However, in current international video coding standards, only a translational motion model is used (see below) and thus fixed rectangular geometry is sufficient.
Ideally, in order to achieve the best chance of finding a match, the whole of the reference frame should be searched. However, this is impractical as it imposes too high a computational burden on the video encoder. Instead, the search region is restricted to a region around the original location of the macroblock in the current frame.
The simplest motion model is the translational motion model which requires only two coefficients to describe the motion vectors of each segment. The motion vector describing the motion of a macroblock in the current frame with respect to the reference frame can point to any of the pixels in the reference frame. This means that motion between frames of a digital video sequence can only be represented at a resolution, which is determined by the image pixels in the frame (so-called full pixel resolution). Real motion, however, has arbitrary precision, and thus the system described above can only provide approximate modeling of the motion between successive frames of a digital video sequence. Typically, modeling of motion between video frames with full pixel resolution is not sufficiently accurate to allow efficient minimization of the prediction error (PE) information associated with each macroblock in a frame. Therefore, to enable more accurate modeling of real motion and to help reduce the amount of PE information that must be transmitted from encoder to decoder, many video coding standards allow motion vectors to point ‘in between’ image pixels. In other words, the motion vectors can have ‘sub-pixel’ resolution (i.e., half-pixel, quarter-pixel, and so on). Allowing motion vectors to have sub-pixel resolution adds to the complexity of the encoding and decoding operations that must be performed, but it is still advantageous to have sub-pixel motion vectors as they improve the coding efficiency by reducing the prediction error.
Motion estimation with sub-pixel resolution is usually performed as a two-stage process for a video coding scheme which allows motion vectors to have full- or sub-pixel resolution. In the first stage, a motion vector having full-pixel resolution is determined using any appropriate motion estimation scheme, such as the block-matching process described in the foregoing.
In the second stage, the motion vector determined in the first stage is refined to obtain the desired sub-pixel resolution. As only the pixel values of original image pixels are known, the values (for example luminance and/or chrominance values) of the sub-pixels residing at sub-pixel locations must be estimated for each of the new search blocks, using some form of interpolation scheme. Generally, such sub-pixel value estimation involves a large amount of overlapped computations, which makes it a very computationally intensive operation. Any reduction in the computation of sub-pixel values during the motion vector estimation would considerably reduce the load on the processor in which the system is realized.