This invention concerns the estimation of motion vectors between video frames in a sequence of frames. Motion vectors are assigned to pixels, or blocks of pixels, in each frame and describe the estimated displacement of each pixel or block in a next frame or a previous frame in the sequence. In the following description, the motion estimation is considered to be “dense” meaning that a motion vector is calculated for every pixel. The definition of “dense” may be widened to cover the calculation of a motion vector for each small block in the picture or for each pixel in a subsampled version of the picture. The invention can be applied with trivial modification to these wider cases.
The term motion estimation is used in this specification to include the estimation of displacement that is not only the result of motion but may also arise from the differences between two images.
Motion estimation has application in many image and video processing tasks, including video compression, motion compensated temporal interpolation for standards conversion or slow-motion synthesis, motion compensated noise reduction, object tracking, image segmentation, and stereoscopic 3D analysis and view synthesis from multiple cameras.
Some of the terminology used in describing motion estimation systems will now be described. FIG. 1 shows one-dimensional sections through two successive frames in a sequence of video frames, referred to as the previous or reference frame (101) and the current frame (102). A motion vector (104) is shown assigned to a pixel (103) in the current frame. The motion vector indicates a point (105) in the reference frame which is the estimated source of the current frame pixel (103) in the reference frame. This example shows a backward vector. Forward vectors may also be measured, in which case the reference frame is the next frame in the sequence rather than the previous frame.
An example of an algorithm that calculates motion vectors for pixels is given in WO 87/05769. The principle of this algorithm is summarised in FIG. 2. The current frame (201) and the previous frame (202) are applied to a phase correlation unit (203) which calculates a “menu” consisting of a number (three in this example) of candidate motion vectors (204). Each candidate vector controls a respective one of a set of shift units (205) which, for every pixel in the current frame, displaces the previous frame (202) by the respective candidate vector to produce a corresponding pixel in a set of displaced frames (206). Each displaced frame (206) is subtracted from the current frame and the resulting difference is rectified and spatially filtered in a respective member of a set of error calculation units (207) to produce a set of errors (208). The errors associated with each candidate vector are compared with each other in a comparison unit (209), which finds the minimum value error and the associated candidate index (210), which is applied to a vector selection unit (211) to select one of the candidate vectors (204) to produce a final ‘assigned’ output vector (212).
In the cited example, the error calculation units (207) rectify the difference between a pixel in the current frame and a displaced pixel in the previous frame. This difference is known as the “displaced frame difference” or “DFD”. The DFD is typically filtered, for example by a linear filter or by one of the improvements described in our co-pending UK patent applications numbered 1206065.3 (Publication No. 2502047) and 1306340.9.
One shortcoming with the above algorithm is that small errors in the magnitude or direction of a candidate vector can lead to a disproportionately large error in the DFD, especially in detailed areas of the picture. Such errors can occur for example when the motion of an object includes a zoom or a rotation, and can lead to failure of the comparison and selection units (209, 211) to find the best candidate motion vector.
Another example of an algorithm that calculates motion vectors is given in B. K. P. Horn and B. G. Schunck, “Determining Optical Flow”, MIT Artificial Intelligence Memo no. 572, April 1980. This algorithm makes use of the known relationship between the spatial and temporal gradients of a sequence of images, working in an iterative fashion to estimate a smoothly varying motion vector field from measurements of such gradients. A “vector field” in the context of this description refers to a set of vectors with one vector for each pixel. The algorithm overcomes the problems encountered in the DFD based algorithm described above in the presence of zooms and rotations. However, it has several other shortcomings. One is that it fails when the motion from one frame to the next is greater than the typical pitch of details present in the scene. Another is that it fails at boundaries between differently moving objects, where the gradient property mentioned above breaks down. The first shortcoming can be addressed by implementing a hierarchical scheme in which the pictures are first analysed at a low sampling rate and the results passed from lower to higher sampling rates in several stages. However, the hierarchical filtering process leads to other problems by blurring the boundaries between moving objects. The second shortcoming can to some extent be addressed by the introduction of robust statistics, for example as described by M. J. Black and P. Anandan in “The robust estimation of multiple motions: parametric and piecewise-smooth flow fields” in Computer Vision and Image Understanding, vol. 63, no. 1 pp 75-104, January 1996.
There is thus a need for improved motion vector processing that delivers more accurate vectors in the presence of complex motion and object boundaries.