Motion video sequences typically contain a significant amount of intra-frame or “spatial” redundancy as well as inter-frame or “temporal” redundancy. Video compression techniques take advantage of this spatial and temporal redundancy to reduce significantly the amount of information bandwidth required to transmit, store and otherwise process video sequences. For example, in the well-known MPEG-2 video encoding standard, described in greater detail in International Telecommunications Union, “Generic Coding of Moving Pictures and Associated Audio,” MPEG-2, 1994, discrete cosine transform (DCT), quantization and variable-length coding operations are used to remove spatial redundancy within a given frame in a sequence of video frames. Temporal or inter-frame redundancy is removed through a process of block-based inter-frame motion estimation and predictive coding.
MPEG-2 video frames may be either intra-coded (I) frames, forward-only predictive (P) frames or bidirectionally-predictive (B) frames. An I frame is encoded using only the spatial compression techniques noted above, while a P frame is encoded using “predictive” macroblocks selected from a single reference frame, where a macroblock corresponds to a 16×16 block of pixels. A given B frame is encoded using “bidirectionally-predictive” macroblocks generated by interpolating between a pair of predictive macroblocks selected from two reference frames, one preceding and the other following the B frame.
In a conventional MPEG-2 encoder, the output of the above-noted quantization operation is applied to an inverse quantizer and then to an inverse DCT generator. The output of the inverse DCT generator is processed over one or more frames by a motion estimator and motion compensator. The motion compensator computes motion vectors which are combined with a subsequent frame so as to reduce inter-frame redundancy and facilitate encoding. The motion vectors are explicitly transmitted as so-called side information to the decoder, for use in decoding the corresponding encoded video bitstream.
MPEG-2 and other conventional block-based motion-compensated video encoding techniques are used in a wide variety of video signal processing applications, including, e.g., video teleconferencing systems, video storage and retrieval systems, and satellite-based digital television systems.
Although it has been found in practice that acceptable video coding performance can be obtained using the above-described block-based motion estimation and compensation, there are inherent problems with this approach. One significant problem is that since physical motion is not piecewise constant, inaccurate compensation can occur at the boundaries of moving objects. As a result, there may be pixels within blocks containing non-uniform motion that are incorrectly compensated, and therefore a significant energy increase in the prediction error signal occurs, with a consequent increase in the bit rate necessary to encode this signal. In addition, at low bit rates, when high quality coding of the prediction error is not possible, blocking artifacts may become clearly visible in the reconstructed video frames. Furthermore, substantial efforts have been devoted over many years to optimizing block-based motion estimation and compensation, making it unlikely that significant performance gains remain to be achieved within that framework.
A need therefore exists for improved motion estimation and compensation techniques, for use in video coding systems and other image sequence processing applications, which overcome the problems associated with the conventional block-based techniques.