Motion estimation and compensation techniques have received increasing attention for the transmission and storage of digital image sequences. For some digital video applications, high compression ratios have been achieved by using motion compensation methods to reduce inherent temporal pixel redundancies in image sequences. In such techniques, a motion field is estimated at an encoder. The motion field relates object locations in a previous frame of a sequence to their new locations in the current frame. Pixel intensities of the previous and current frames are used to compute the estimated motion field. This motion field estimate must then be reconstructed at a decoder without the benefit of intensities of the pixels in the current frame.
The principle of motion field estimation, which is well known in the prior art, may be better understood with respect to FIG. 1, which shows a preceding frame and a present frame. An object positioned at point A' in the preceding frame is moved to point B in the present frame. A two dimensional displacement or motion vector, v, is calculated from the point A' in the preceding frame to point B' in the preceding frame, where point B' corresponds to point B in the current frame. A signal I' (r+v) at point A' instead of a signal I' (r) at point B is used as a motion compensated prediction signal and is subtracted from a signal I(r) at point B so as to obtain a prediction error signal I(r)-I' (r+v) where r is the position vector which indicates a given position on the video screen. In motion compensated coding, the prediction error signal I(r)-I' (r+v) is smaller than the prediction error signal I(r)-I' (r). The former prediction error signal, therefore, can be used effectively to code an image signal with a moving object.
Block-based techniques represent one type of motion compensation method which computes motion vectors at an encoder and transmits them to a decoder where the motion field is constructed. In block-based video coding techniques, such as the one described in U.S. Pat. No. 4,307,420, a frame is divided into non-overlapping blocks or regions of N.times.N pixels. In order to limit the amount of information that must be transmitted to the decoder, block-based methods assume that blocks of pixels move with constant translational motion. A best match for each block is determined in the previously transmitted frame, where the criteria is typically the mean absolute difference between the intensities of the two blocks. The relative difference in position between the current block and the matched block in the previous frame is the motion vector. The intensity of the matched block is subtracted from the intensity of the current block in order to obtain the displaced frame difference (DFD). The collection of all the motion vectors for a particular frame forms a motion field. The motion field and the displaced frame differences are then transmitted from the encoder to the decoder, which predicts the new image based upon this transmitted information and the previous image in the sequence of images.
One inherent difficulty in block-matching techniques results from the assumption that motion is constant within any given block. When objects in a particular block move at different velocities, the motion vector obtained may correspond to only one, or possibly even none, of the objects in the block. If the size of the blocks is decreased, then the assumption becomes more valid. The overhead of computation and transmission of displacement or motion information, however, increases.
One method for improving motion estimation and compensation, proposed by M. T. Orchard in "Predictive Motion-Field Segmentation For Image Sequence Coding," IEEE Transactions on Circuits and Systems For Video Technology, Vol. 3 (Feb. 1993), involves segmenting the motion field of frames in a sequence, and using the segmentation to predict the location of motion-field discontinuities in the current frame. Motion estimates for each segmented region are chosen from among the motion vectors of the nearest neighboring regions based upon the motion vector that minimizes the prediction error. A scheme is then presented for predicting the segmentation at the decoder computed from previously decoded frames.
A similar technique of motion estimation and segmentation is disclosed in Liu et al., "A Simple Method To Segment Motion Field For Video Coding," SPIE Visual Communications and Image Processing, Vol. 1818, pp. 542-551 (1992). Motion vectors for blocks of sixteen by sixteen pixels are first determined by block matching, and each block is then divided into sixteen sub-blocks of four by four pixels. A motion vector is chosen for each sub-block from among the motion vectors of the larger block and neighboring blocks such that the prediction error is minimized.