Digital video sequences, like ordinary motion pictures recorded on film, comprise a sequence of still images, the illusion of motion being created by displaying the images one after the other at a relatively fast frame rate, typically 15 to 30 frames per second. Because of the relatively fast frame rate, images in consecutive frames tend to be quite similar and thus contain a considerable amount of redundant information. For example, a typical scene may comprise some stationary elements, such as background scenery, and some moving areas, which may take many different forms, for example the face of a newsreader, moving traffic and so on. Alternatively, the camera recording the scene may itself be moving, in which case all elements of the image have the same kind of motion. In many cases, this means that the overall change between one video frame and the next is rather small.
Each frame of raw, that is uncompressed, digital video sequence comprises a very large amount of image information. Each frame of an uncompressed digital video sequence is formed from an array of image pixels. For example, at a resolution of 640 by 480 pixels, there are 307,200 pixels in a single frame. The amount of data required to represent this frame is a direct function of the pixel depth, that is, the number of unique color values a given pixel may take. Each pixel is represented by a certain number of bits, which carry information about the luminance and/or color content of the region of the image corresponding to the pixel. Commonly, a so-called YUV color model is used to represent the luminance and chrominance content of the image. The luminance, or Y, component represents the intensity (brightness) of the image, while the color content of the image is represented by two chrominance components, labeled U and V.
Video compression methods are based on reducing the redundant and perceptually irrelevant parts of video sequences. Thus, video compression codecs perform motion estimation to calculate the difference between successive frames and encode that difference rather than all the data in the frame itself. This means that motion between frames of a digital video sequence can only be represented at a resolution which is determined by the image pixels in the frame (so-called integer pixel resolution). Real motion, however, has arbitrary precision. Typically, modeling of motion between video frames with integer pixel resolution is not sufficiently accurate to allow efficient minimization of the prediction error (PE) information associated with each macroblock/frame. That is, when performing motion estimation, in some cases, the previous pixel block has moved a non-integer number of pixels from its previous location requiring interpolation to determine the pixel values at these non-integer locations. Therefore, to enable more accurate modeling of real motion and to help reduce the amount of PE information that must be transmitted from encoder to decoder, many video coding standards allow motion vectors to point ‘in between’ image pixels. In other words, the motion vectors can have ‘sub-pixel’ resolution. For example, the H.264 Video CODEC uses ½ and ¼ pixel resolutions. However, these functions can be a performance bottleneck in codec implementations since allowing motion vectors to have sub-pixel resolution adds to the complexity of the encoding and decoding operations that must be performed.