The invention relates generally to methods and apparatus for transmitting sequences of image representing data, and in particular, to a method and apparatus for estimating the motion difference between successive frames of an image sequence for transmission over a bandwidth limited channel.
Many motion estimation methods and apparatus are available to describe, in one manner or another, the differences between successive frames of an image sequence in order to help reduce the bandwidth requirements associated with transmitting the sequence of images through a bandwidth limited channel. The literature suggests that motion estimation methods can be divided between spatial-domain motion estimation and frequency-domain motion estimation. The present invention is directed to spatial-domain motion estimation.
The literature further suggests that spatial-domain motion estimatation can be categorized as (a) region matching, (b) adaptive linear prediction, (c) frame and pixel differencing, and (d) recursive minimization. The methods belonging to the region matching catagory use some type of correlation measure to search for a region within an image which most closely matches a particular region in a previous image. This work, however, required large amounts of computation, and the computation increases as a finer resolution becomes desirable. One more efficient region matching method relies upon a logarithmic search procedure wherein displacements are found by successively reducing the area of the search. For each step, the error is computed at five locations, one in the center and four along the respective X and Y axis passing through the center. The procedure continues until a displacement accurate to within one pixel is found. This method assumes that the error function monotonically increases away from the actual displacement; and, as will be noted below, such is not always the case. Hence the logarithmic search method can become trapped in local minima rather than reaching the global minimum.
The adaptive linear prediction method is an extension of the two dimensional differential pulse code modulation methods developed during the 1950's. A number of these methods form a prediction based upon spatially neighboring pixels as well as those from previous temporally earlier frames. The prediction model can be adapted using overhead data or can adapt itself according to spatially varying image statistics. In any instance, three-dimensional adaptive prediction can be regarded as an example of motion estimation; and this can be confirmed by recognizing that the predictor is likely to place its largest weighting on the most highly correlated pixels which are available. For pure translation, the most highly correlated pixels are those lying on the motion path in previous frames.
Adaptive linear prediction methods have several short-comings. First, they require significant computation to derive the predictor when the order of prediction increases. And, to take advantage of a displacement of several pixels, the order must be quite high. A second disadvantage is that the predictor coefficients do not necessarily suggest a single displacement vector. Accordingly, the overhead data required for such a transmitter-based motion estimation method can become quite substantial.
The frame and pixel differences method utilizes the relationship between the spatial and temporal first difference signals and derives a measure of displacement magnitude by accumulating the absolute temporal difference signal over a region and dividing it by the accumulated absolute pixel difference signal. The accumulated pixel difference signal serves as a normalizing factor to account for the size and detail of the moving image. While the method is very attractive in terms of real-time implementation, it is not a very accurate motion estimation method, particularly when displacements of several pixels are encountered.
The concept of recursive spatial-domain motion estimation, introduced by Netravali and Robbins, and described in U.S. Pat. 4,218,703, issued Aug. 19, 1980, converges upon explicit displacement estimates through the recursive updates of a steepest descent search. Thus, for each pixel in an image, an error function is defined as the squared difference between the desired pixel value and that of a displaced pixel in the previous frame. Thus, for a typical raster scan, an initial displacement is assigned based upon the value of one or more of the displacements which have already been computed for the neighboring pixels. If the error function at the present pixel is below a predetermined threshold, the method proceeds to the next pixel without modifying the displacement assigned in accordance with the neighboring pixels. Otherwise, the gradient of the error function is determined with respect to the displacement vector, and in an attempt to minimize the error function, the displacement vector is perturbed in a direction opposite to the gradient and in an amount proportional to its magnitude. The method thus uses this new value as the displacement value for the pixel and then proceeds to the next picture element.
Since its conception, several papers have discussed improvements in this "pel-recursive" method. Two improvements are of particular interest. In the first, the error function is defined over several neighboring pixels. This modification provides greater noise immunity, thereby improving the stability and accuracy of the method. However, the amount of computation required for the recursion on each pixel increases. The second improvement is a departure from the steepest descent minimization, which converges slowly once a point near the minimum is reached. Hence, using an error function over several pixels, Netravali and Robbins suggest the use of a least-mean-square minimization method which is conceptually similar to the steepest descent, except for the inclusion of a steering matrix which multiplies the gradient value. Thus, near the minimum, the steering matrix effectively lengthens the step size over that given by the steepest descent alone. This method, however, does require a matrix inversion and therefore suffers from possible singularity conditions near the minimum.
The pel-recursive method suffers from two deficiencies. First, the method has been used almost exclusively as a statistical predictor embedded in a differential pulse-code-modulation system. Thus, even when the error is defined over several pixels, these pixels are restricted to be those which have already been "transmitted". This limits the performance of the pel-recursive method in the same manner as the self-adaptive linear prediction method. Second, the pel-recursive method uses a steepest descent approach with but a single iteration on each pixel.
Accordingly, an object of the invention is a more efficient and reliable motion estimation apparatus and method, and a more exact definition of the motion vector displacement for an image in a sequence of images. Other objects of the invention are a motion estimation method and apparatus having high precision and minimal computational burden. Further objects of the invention are a motion estimation method and apparatus for determining motion vector displacement over block regions of an image, having sub-pixel resolution, good noise immunity, and which provides for a minimization process which generally reaches a global minimum for the block regions.