In the estimation of motion vectors between video frames, motion vectors are assigned to pixels, or blocks of pixels, in each frame and describe the estimated displacement of each pixel or block in a next frame or a previous frame in the sequence of frames. In the following description, the motion estimation is considered to be “dense” meaning that a motion vector is calculated for every pixel. The definition of “dense” may be widened to cover the calculation of a motion vector for each small block in the picture, for each pixel in a subsampled version of the picture, or for each small region of arbitrary shape within which the motion is expected to be uniform. The invention can be applied with trivial modification to these wider cases.
Motion estimation has application in many image and video processing tasks, including video compression, motion-compensated temporal interpolation for standards conversion or slow-motion synthesis, motion-compensated noise reduction, object tracking, image segmentation, and, in the form of displacement estimation, stereoscopic 3D analysis and view synthesis from multiple cameras.
Most applications of motion estimation involve the “projection” (also described as “shifting”) of picture information forward or backward in time according to the motion vector that has been estimated. This is known as “motion-compensated” projection. The projection may be to the time instant of an existing frame or field, for example in compression, where a motion-compensated projection of a past or future frame to the current frame instant serves as a prediction of the current frame. Alternatively, the projection may be to a time instant not in the input sequence, for example in motion-compensated standards conversion, where information from a current frame is projected to an output time instant, where it will be used to build a motion-compensated interpolated output frame.
Some of the terminology used in describing motion estimation systems will now be described. FIG. 1 shows one-dimensional sections through two successive frames in a sequence of video frames. The horizontal axis of FIG. 1 represents time, and the vertical axis represents position. Of course, the skilled person will recognise that FIG. 1 is a simplification and that motion vectors used in image processing are generally two dimensional. The illustrated frames are: a previous or reference frame (101); and, the current frame (102). A motion vector (104) is shown assigned to a pixel (103) in the current frame. The motion vector indicates a point (105) in the reference frame which is the estimated source, in the reference frame, of the current frame pixel (103). This example shows a backward vector. Forward vectors may also be measured, in which case the reference frame is the next frame in the sequence rather than the previous frame.
The following descriptions assume that these frames are consecutive in the sequence, but the described processes are equally applicable in cases where there are intervening frames, for example in some compression algorithms. Temporal samples of an image will henceforth be referred to as fields, as would be the case when processing interlaced images. However, as the skilled person will appreciate, in non-interlaced image formats a temporal sample is represented by a frame; and, fields may be ‘de-interlaced’ to form frames within an image process. The spatial sampling of the image is not relevant to the discussion which follows.
An example of an algorithm that calculates motion vectors is disclosed in GB2188510. This algorithm is summarised in FIG. 2 and assigns a single vector to every pixel of a current field in a sequence of fields. The process of FIG. 2 is assumed to operate sequentially on the pixels of the current field; the pixel whose vector assignment is currently being determined will be referred to as the current pixel. The current field (202) and the previous field (201) are applied to a phase correlation unit (203) which calculates a “menu” (204) for every pixel of the current field consisting of a number (three in this example) of candidate motion vectors. Each candidate vector controls a respective member of a set of shift units (205) which, for every pixel in the current field, displaces the previous field (201) by the respective candidate vector to produce a shifted pixel corresponding to the current pixel of the current field in the respective member of the set of displaced fields (206).
A set of error calculation units (207) produces a set of error values (208), one error value for every menu vector for every pixel of the current field. Each of the error calculation units (207) subtracts the respective one of the displaced fields (206) from the current field (202) and rectifies the result to produce a field of difference magnitudes, which are known as displaced field differences or “DFDs”. Each of the error calculation units (207) spatially filters its respective field of DFDs in a filter centred on the current pixel to give an error value for that pixel and menu vector. This spatially filtered DFD is the error value for the respective current pixel and vector. The set three error values (208) for the current pixel are compared in a comparison unit (209), which finds the minimum error value. The comparison unit (209) outputs a candidate index (210), which identifies the vector that gave rise to the minimum error value. The candidate index (210) is then applied to a vector selection unit (211) to select the identified candidate from the menu of vectors (204) as the respective output assigned vector (212) for the current pixel.
An important property of DFDs will now be described. If a candidate motion vector for a pixel describes the true motion of that pixel, then we would expect the DFD to be small, and only non-zero because of noise in the video sequence. If the candidate motion vector is incorrect, then the DFD may well be large, but it might be coincidentally small. For example, a rising waveform in one field may match a falling waveform in the displaced field at the point where they cross. Alternatively, a pixel may be in a plain area or in a one-dimensional edge, in which case several motion vectors would give rise to a small or even a zero DFD value. This inconvenient property of DFDs is sometimes referred to as the “aperture problem” and leads to the necessity of spatially filtering the DFDs in order to take information from nearby pixels into account in determining the error value for a pixel.
In the example of FIG. 2, each error calculation block (207) filters the DFDs with a two-dimensional filter, a typical example of which is a 5×5 running-average filter. It is this rectified and filtered error that is used for comparison of candidate motion vectors. FIG. 3 illustrates the positions of the 25 samples involved in the running-average filter. The 5×5 arrangement of 25 samples comprises the samples within the rectangular filter window (302) and is centred on the current pixel position (301).
Choosing the size of the two-dimensional DFD filter involves a trade-off between reliability and spatial accuracy of the resulting assigned motion vector field. If, on the one hand, the filter is large, then the effect of noise on the filtered error value is reduced and the filter is more likely to take into account nearby detail in the picture which might help to distinguish reliably between candidate motion vectors. However, a large filter is also more likely to take in pixel data from one or more objects whose motion is properly described by different motion vectors, in which case it will fail to give a low error value for any candidate motion vector, even for one that is correct for the pixel in question.
If, on the other hand, the filter is small, it is more likely to involve pixels from only one object and so is more likely to return a low error value for the correct motion vector. However, it will be less likely to reject wrong motion vectors and will be more susceptible to noise.
The inventors have observed that, for critical picture material, there is no choice of filter size which yields satisfactory performance in all aspects of reliability, noise immunity, spatial accuracy and sensitivity. However, the inventors have recognized that it is possible to design an improved displaced field difference filter which combines the reliability and noise immunity of a large conventional filter with the sensitivity and spatial accuracy of a small filter, while avoiding the disadvantages of each.