The present invention relates to video compression, and more particularly to adaptive multi-modal motion estimation for video compression.
The use of motion compensation in video coding plays an important role in achieving better compression efficiency by removing the temporal redundancy in video sequences. The MPEG-2 video compression standard, as defined in ANSI-ISO/IEC 13818-2 (1995) and MPEG2 Test Model 5 (1993), uses a block-based motion estimation and compensation technique. A displaced frame difference (DFD) is a common error measure used in block matching motion estimation algorithms. The block matching process in general searches for minimum block sums of absolute DFD errors between frames at times t and t+n.
The computational cost for an exhaustive search is extremely high, especially for large search ranges. This has prompted many research activities in seeking a more efficient method. Some well-known techniques include hierarchical search, as described in Bierling""s xe2x80x9cDisplacement Estimation by Hierarchical Block Matchingxe2x80x9d, SPIE Visual Communications and Image Processing 1988, Vol. 1001, pp.942-951, logarithmic search, such as described in Jains"" xe2x80x9cDisplacement Measurement and its Application in Interframe Coding for Video Conferencingxe2x80x9d, IEEE Transactions on Communications, Vol. COM-29, pp. 1779-1806 (1981), etc. These methods are designed to reduce the computational load, but further improvements are still possible.
For video compression systems operating at high bit rates, the cost of transmitting motion vectors may be negligible. But for medium to low bit rates, the cost of transmitting motion vectors has to be taken into account. A cost function C(mv) is formulated for which an optimum estimator seeks a set of displacement vectors (mv) to minimize:
C(mv)=xcexa3D(mv)+xcex*xcexa3L(mv)
where D(mv) is the sum of absolute pixel DFD(mv) for each block, L(mv) is the motion vector code length, and xcex is a constant that weights the relative cost of transmitting motion vectors with respect to the total bit rate. The summation is calculated over the entire frame.
To find a global minimum for the cost function is an extremely difficult problem, especially given the fact that due to differential coding the cost of L(mv) is affected by the neighboring blocks. In practice the ideal motion estimation algorithm is likely to be adaptive to the characteristics of the moving video sequences. The motion search range has to cover not only all possible movements, but also no more than is necessary. For example when the video sequence contains slow moving scenes, the search range should be correspondingly small, otherwise spurious false matching is likely to expand the cost of L(mv). On the other hand if the movements are larger than the search range, the residue error D(mv) is high and the effectiveness of the motion compensation is degraded. So the search range has to be large enough to handle fast moving video sequences.
The two factors mentioned above show that an adaptive algorithm is most likely to achieve near optimum performance. Therefore what is desired is an adaptive motion estimation algorithm for video compression that copes with the motion dynamics of the video sequences.
Accordingly the present invention provides an adaptive multi-modal motion estimation algorithm for video compression using an adaptive pivot and multi-modal search method. A luminance pyramid is built such that at the top (Nth) level each pixel represents 2{circumflex over ( )}N*2{circumflex over ( )}N pixels in the base pyramid. A basic correlation is done at the top level for images at times t and t+n, with the location of a peak level defining a global motion vector between images. The global motion vector is used as a pivot point for subsequent top-level block motion search and to define a search area. The top level image is subdivided into Mxc3x97N blocks and a pivot search is carried out around the pivot point in the search area. The block motion vectors from a higher level serve as initial conditions for a finer resolution level. The results of the segmentation and refinement process determine whether a zero pivot motion search is desired, such as when a camera is tracking a fast moving object. Finally the refinement and zero pivot searches are repeated for every level of the pyramid until the base, full resolution level is done, resulting in estimated motion vectors for the image.
The objects, advantages and other novel features are apparent from the following detailed description when read in conjunction with the appended claims and attached drawing.