Modern video transmission and display systems, and particularly those systems that present high-definition content, require significant data compression in order to produce a visually acceptable motion picture, because transmission media simply cannot transmit an uncompressed sequence of video frames at a fast enough rate to appear as continuous motion to the human eye. At the same time, and again to produce a visually-acceptable picture, the compression technique used should not unduly sacrifice image quality by discarding too much frame data.
To achieve these dual, and conflicting goals, video compression and encoding standards such as MPEG and H.264 take advantage of temporal redundancy in the sequence of video frames. In other words, in the vast majority of video sequences of interest to a person, adjacent frames typically show the same objects or features, which may move slightly from one frame to another due either to the movement of the object in the scene being shot (producing local motion in a frame), the movement of the camera shooting the scene (producing global motion), or both.
Video compression standards employ motion estimation to define regions in an image, which may correspond to objects, and associate with those regions a motion vector that describes the inter-frame movement of the content in each region so as to avoid redundant encoding and transmission of objects or patterns that appear in more than one sequential frame, despite appearing at slightly different locations in sequential frames. Motion vectors may be represented by a translational model or many other models that approximate the motion of a real video camera, such as rotation, translation, or zoom. Accordingly, motion estimation is the process of calculating and encoding motion vectors as a substitute for duplicating the encoding of similar information in sequential frames.
Though motion vectors may relate to the whole image, more often they relate to small regions if the image, such as rectangular blocks, arbitrary shapes, boundaries of objects, or even individual pixels. There are various methods for finding motion vectors. One of the popular methods is block-matching, in which the current image is subdivided into rectangular blocks of pixels, such as 4.times.4 pixels, 4.times.8 pixels, 8.times.8 pixels, 16.times.16 pixels, etc., and a motion vector (or displacement vector) is estimated for each block by searching for the closest-matching block in the reference image, within a pre-defined search region of a subsequent frame.
As implied by this discussion, the use of motion vectors improves coding efficiency for any particular block of an image by permitting a block to be encoded only in terms of a motion vector pointing to a corresponding block in another frame, and a “residual” or differential between the target and reference blocks. The goal is therefore to determine a motion vector for a block in a way that minimizes the differential that needs to be encoded. Accordingly, numerous variations of block matching exist, differing in the definition of the size and placement of blocks, the method of searching, the criterion for matching blocks in the current and reference frame, and several other aspects.
With conventional motion compensation, an encoder performs motion estimation and signals the motion vectors as part of the bitstream. The bits spent on sending motion vectors can account for a significant portion of the overall bit budget, especially for low bit rate applications. Recently, motion vector competition (MVC) techniques have been proposed to reduce the amount of motion information in the compressed bitstream. MVC improves the coding of motion vector data by differentially encoding the motion vectors themselves in terms of a motion vector predictor and a motion vector differential, where the motion vector predictor is usually selected by the encoder from a number of candidates so as to optimize rate distortion, where the candidate motion vectors consist of previously encoded motion vectors for either adjacent blocks in the same frame and/or a subset of motion vectors in a preceding frame. In other words, just as the use of a motion vector and a differential improves coding efficiency of block data by eliminating redundancies between information in sequential frames, the coding of motion vectors can exploit redundancies in situations where motion vectors between sequential frames do not change drastically, by identifying an optimal predictor, from a limited set of previously-encoded candidates, so as to minimize the bit length of the differential. The predictor set usually contains both spatial motion vector neighbors and temporally co-located motion vectors, and possibly spatiotemporal vectors.
Even using motion vector competition techniques when encoding video, however, the necessary bit rate to preserve a desired quality is often too high for the transmission medium used to transmit the video to a decoder. What is needed, therefore, is an improved encoding system for video transmission.
The foregoing and other objectives, features, and advantages of the invention will be more readily understood upon consideration of the following detailed description of the invention taken in conjunction with the accompanying drawings.