The determination of motion from an image sequence is currently employed for a wide variety of tasks. For instance, in video coding the MPEG and H.261/2/3 standards employ motion information in order to efficiently compress image sequence data. The idea is that generally image content does not change substantially between frames in any interesting image sequence, excepting for motion. Thus if it were possible to transmit one frame at the start of a scene, and then simply send the motion information for subsequent frames instead of the actual picture material, then all the subsequent frames in that scene could be built at the receiver. The various MPEG and H.26x standards exploit this idea and in practice stipulate an allowable maximum amount of frames over which such motion compensated prediction is possible. It is because of video coding in particular that motion estimation has been widely studied and is of industrial importance.
Motion estimation is also useful for a number of video content retrieval tasks e.g. shot cut detection [6] and event detection [7]. It is also vital and heavily used for reconstructing missing images, deinterlacing, and performing sequence restoration tasks in general [20, 15].
The Block Matching motion estimation algorithm is perhaps the most popular estimator and numerous variants have been proposed in the scientific [18, 3, 21, 10] and patent literature [19, 9, 17] from as early as 1988. The general idea is to assume that blocks of pixels (16×16 in the MPEG2 standard, and optionally 8×8 in the MPEG 4 standard) contain a single object moving with some simple and single motion. An exhaustive search in the previous and/or next frames for the best matching block of pixels of the same size, then yields the relevant motion vector.
Of course motion in an image sequence does not necessarily obey the block matching assumption. Typically at the boundaries of moving objects, blocks will contain two types of pixels. Some will be part of the moving object, while others will be part of another moving object or a stationary background. This situation is shown in FIG. 1. While this does not affect the use of block matching for video coding very much, it does have an implication for image manipulation e.g. restoration, deinterlacing, enhancement. In those applications processing blocks at motion boundaries without acknowledging the motion discontinuity causes poor image quality at the output sometimes giving the effect of dragging or tearing at moving object boundaries. One method for solving this problem is to split blocks at such boundaries e.g. as proposed in [8]. De Haan et al [17] propose an invention that also describes one such variant of that idea.
As early as 1981 [14, 13, 12] it was recognised that having a motion vector for every pixel in an image might overcome this problem. Various schemes have since then been proposed to do this based typically on some image gradient observations and the incorporation of the notion that motion in a local area of the image should be smooth in some sense. These are typically iterative methods and the result is a motion vector for every pixel in the image, yielding what is called the optical flow for an image. However, although estimating a vector for every pixel does overcome the problem somewhat, there is still no notion in determining optical flow of whether that pixel exists in future or previous frames i.e. there is no understanding of occlusion.
In some sense, occlusion estimation is related to allowing for motion discontinuities at the boundaries of moving objects. Since 1993 (Black et al [4]) this idea has been pursued in that way. Motion discontinuity estimation is now widely accepted to be a vital piece of information required to assist image manipulation tasks in general. For instance, in [11] an invention is described that uses a block splitting method to aid deinterlacing of video by extracting motion discontinuity information.