This invention concerns the estimation of motion vectors between video frames; and, in particular, the evaluation of the quality of motion vectors, or a motion vector field, produced by a motion measurement process.
In motion compensated image processing, motion vectors are assigned to pixels, or blocks of pixels, in each frame and describe the estimated displacement of each pixel or block in a next frame or a previous frame in the sequence. In the following description, the motion estimation is considered to be “dense” meaning that a motion vector is calculated for every pixel. The definition of “dense” may be widened to cover the calculation of a motion vector for each small block in the picture or for each pixel in a subsampled version of the picture. The invention can be applied with trivial modification to these wider cases.
Motion estimation has application in many image and video processing tasks, including video compression, motion compensated temporal interpolation for standards conversion or slow-motion synthesis, motion compensated noise reduction, object tracking, image segmentation, and, in the form of displacement estimation, stereoscopic 3D analysis and view synthesis from multiple cameras.
Some of the terminology used in describing motion estimation systems will now be described. FIG. 1 shows one-dimensional sections through two successive frames in a sequence of video frames, referred to as the previous or reference frame (101) and the current frame (102). A motion vector (104) is shown assigned to a pixel (103) in the current frame. The motion vector indicates a point (105) in the reference frame which is the estimated source of the current frame pixel (103) in the reference frame. This example shows a backward vector. Forward vectors may also be measured, in which case the reference frame is the next frame in the sequence rather than the previous frame.
An example of an algorithm that calculates motion vectors for pixels is given in WO 87/05769. The principle of this algorithm is summarised in FIG. 2. The current frame (201) and the previous frame (202) are applied to a phase correlation unit (203) which calculates a “menu” consisting of a number (three in this example) of candidate motion vectors (204). Each candidate vector controls a respective one of a set of shift units (205) which, for every pixel in the current frame, displaces the previous frame (202) by the respective candidate vector to produce a corresponding pixel in a set of displaced frames (206). Each displaced frame (206) is subtracted from the current frame and the resulting difference is rectified and spatially filtered in a respective member of a set of error calculation units (207) to produce a set of errors (208). The errors associated with each candidate vector are compared with each other in a comparison unit (209), which finds the minimum value error and the associated candidate index (210), which is applied to a vector selection unit (211) to select one of the candidate vectors (204) to produce a final ‘assigned’ output vector (212).
In the cited example, the error calculation units (207) rectify the difference between a pixel in the current frame and a displaced pixel in the previous frame. This difference is known as the “displaced frame difference” or “DFD” for that pixel for the respective vector. The DFD is a measure of the ‘quality’ of the motion vector, or vector field, used to displace pixels. An important property of DFDs will now be described. If a candidate motion vector for a pixel describes the true motion of that pixel, then we would expect the DFD to be small, and only non-zero because of noise in the video sequence. If the candidate motion vector is incorrect, then the DFD may well be large, but it might be coincidentally small. For example, a rising waveform in one frame may match a falling waveform in the displaced frame at the point where they cross. Alternatively, a pixel may be in a plain area or in a one-dimensional edge, in which case several motion vectors would give rise to a small or even a zero DFD value. This inconvenient property of DFDs is sometimes referred to as the “aperture problem” and leads to the necessity of filtering the DFD in order to take information from nearby pixels into account.
In the example of FIG. 2, vectors are applied to blocks of pixels and each of the error calculation units (207) filters the respective DFD with a two-dimensional filter, a typical example of which is a 5×5 running-average filter. It is this rectified and filtered error that is used for comparison of candidate motion vectors. FIG. 3 illustrates the samples involved in the running-average filter. The set of 5×5 samples (302) is centred on the current pixel position (301).
Choosing the size of the two-dimensional error filter involves a trade-off between reliability and spatial accuracy of the resulting motion vector field created by selecting (allocating) vectors on the basis of the filtered errors. If, on the one hand, the filter is large, then the effect of noise on the filtered error is reduced and the filter is more likely to take into account nearby detail in the picture which might help to distinguish reliably between candidate motion vectors. However, a large filter is also more likely to take in pixel data from one or more objects whose motion is properly described by different motion vectors, in which case it will fail to give a low error for any candidate motion vector, even for one that is correct for the pixel in question.
If, on the other hand, the filter is small, it is more likely to involve pixels from only one object and so is more likely to return a low error for the correct motion vector. However, it will be less likely to reject wrong motion vectors and will be more susceptible to noise.
Our prior UK patent application GB 2502047 provides an improved displaced frame difference filter which combines the reliability and noise immunity of a large conventional filter with the sensitivity and spatial accuracy of a small filter, while avoiding the disadvantages of each. However, the choice of the size (extent) of the filter aperture remains problematic; for critical picture material, there is no choice of filter size which yields satisfactory performance in all aspects of reliability, noise immunity, spatial accuracy and sensitivity. There is thus a need for an improved DFD filter that provides a valid measure of the quality of a motion vector, or motion vector field, regardless of the character of the image being analysed.