Methods for encoding moving pictures or video such as the MPEG1, MPEG2, H.261, and H.263 standards have been developed for efficient data transmission and storage. A detailed description of one such encoding method is found in MPEG2 Test Model 5, ISO/IEC JTC1/SC29/WG11/N0400, April 1993, and the disclosure of that document is hereby expressly incorporated herein by reference. In the described encoding method, an input video sequence is organized into a sequence layer, group-of-pictures (GOP), pictures, slices, macroblocks, and finally block layer. Each picture is coded according to its determined picture coding type. The picture coding types used include intra-coded picture (I-picture), predictive-coded picture (P-picture), and bi-directionally predictive-coded picture (B-picture).
Motion estimation/compensation, transform coding, and statistical coding are utilized to efficiently compress the input video sequence. For example in MPEG2 Test Model 5, each picture from the input video sequence is partitioned into rows of smaller and non-overlapping macroblocks of picture elements (pixels). Macroblocks in each row may be grouped into one or more slices. The compression is performed on each macroblock on a row-by-row basis starting from the leftmost macroblock to the rightmost macroblock, and the top row to the bottom row.
In the motion estimation/compensation method, motion vectors are detected for each macroblock in a picture. The coding mode for a macroblock (e.g. intra-coded, forward-predicted, backward-predicted, or interpolated) is decided based on the detected motion vectors and the determined picture coding type. The utilized motion vectors are differentially coded with variable length codes before outputting.
A typical motion vector detection process comprises determining, for each macroblock to be coded, a search window consisting of pixels from a reference picture and matching pixel values of the macroblocks to blocks of pixel values obtained from the search window. This process is known to be computationally intensive. Particularly, the size of the search window has a direct impact to the computation load.
Many methods of matching the pixel blocks are available, such as an exhaustive search method which compares every definable block within the search window, a logarithmic search method, a hierarchial search, and various other possible derivations. Depending on application requirements, a search method may be selected based on its performance in terms of accuracy and computation complexity.
To cater for sequences with large object movements between pictures, methods exist to increase the search range without enlarging the search window. These methods typically incorporate some form of prediction into the motion vectors, based on certain assumptions, to provide greater accuracy motion vectors for picture sequences with large movements without a large increase in computation load. One such method is the telescopic search method in which the motion vectors of macroblocks from a previously coded or matched picture are used to generate a new search window for each current macroblock. The telescopic search method comprises the steps of obtaining a motion vector from a co-sited macroblock from a closest coded picture; optional scaling of the obtained motion vector according to the picture distances between the reference picture, the closest coded picture, and the current picture; and defining the search window based on the centre position of the current macroblock plus an offset defined by the scaled motion vector.
Alternate methods of determining search windows are disclosed in U.S. Pat. Nos. 5,473,379 and 5,657,087, for example. The methods disclosed therein comprise the steps of calculating a global motion vector based on the motion vectors of a previous picture, and offsetting search windows of all macroblocks by the calculated global motion vector. The global motion vector may be determined by the mean or the median function, or by the most common motion vector of the previous picture; it can be further normalized according to the picture distances. The calculated global motion vector may then represent a global translational motion of objects from one picture to the other.
There are also hybrid motion estimators which combine both full search and hierarchical search to take advantage of the accuracy of full search and wide coverage of hierarchical search under a certain hardware limitation. For example, U.S. Pat. No. 5,731,850 discloses a system in which either full search or hierarchical search is chosen based on the search range imposed on various picture types. A full search is chosen if the search range assigned to that picture is below a certain threshold, else a hierarchical search is chosen.
Current arts use a fixed search range and one set of search windows for the various picture types in encoding a moving sequence, which fails to address the problem of varying motion characteristics within a moving sequence. A sequence may consist of segments with different characteristics: one segment may consist of slow moving objects with stationary background, another may consist of fast moving objects with stationary background, yet another with fast moving objects and background, and many other combinations. With such complex motion characteristics, having a fixed search range for individual picture types is inefficient as it over services during the slow moving segments while under servicing fast moving segments. This results in non-uniform motion estimator performance and inefficient bit allocation to coding the motion vectors. All these factors will lower the general performance of the encoder and also result in non-uniform output bitstreams quality.
Motion estimators of the type disclosed in U.S. Pat. No. 5,731,850 can use a hybrid of full search and hierarchical search to take advantage of the accuracy of full search and wide coverage of hierarchical search, but the search range is still pre-assigned and does not take account of the possible different motion characteristics of a moving sequence. Thus, this form of motion estimator will not have a good adaptability to moving sequences with large motion variances. The motion estimator therein disclosed is more concerned in offering trade-off in accuracy and wide coverage given a certain hardware limitation and a pre-assigned search range.
Methods utilising the global motion vector such as disclosed in the aforementioned U.S. Pat. Nos. 5,473,379 and 5,657,087 may be used to minimise search window cache size as well as the bandwidth requirement from the frame memory while expanding the actual search range. These methods fix the offset of the search window for all macroblocks in a picture. However, given that only one global motion vector is used for the offset of all search windows in a picture, the search range expansion works well only with pictures containing uniform translational motion. Pictures with zooming, rotational motion. sheering effects and pictures with more than one group of translational motions are not well exploited.