A motion vector field is a pixel-by-pixel map of image motion from one image frame to the next image frame. Each pixel in the frame has a motion vector which defines a matching pixel in the next frame or in a previous frame. The combination of these motion vectors is the motion vector field.
Although the techniques described herein could easily be applied to image components other than frames, such as image fields or portions of image frames, the description below refers only to image frames so as to avoid confusion in terminology with the fields of motion vectors.
The estimation of motion vector fields is an important task in many areas of endeavor such as computer vision, motion compensated coding of moving images, image noise reduction and image frame-rate conversion. The problem of estimating motion vector fields is inherently difficult to understand. This is because many different sets of motion vector fields can be used to describe a single image sequence.
One simple approach is to assume that a block of pixels moves with the same kind of motion such as constant translation or an affine motion. This kind of block matching approach frequently fails to produce a good estimation of motion because it disregards the motion of pixels outside of the block. Thus, the motion model may be incorrect for describing the true motion of pixels within a block when the block size is large and may be significantly affected by noise when the block size is small.
Conventional approaches to the problem of estimating motion vector fields typically require simultaneously solving equations having several thousand unknown quantities. Numerous techniques, based on gradients, correlation, spatiotemporal energy functions and feature matching have been proposed. These techniques have relied upon local image features such as the intensity of individual pixels and on more global features such as edges and object boundaries.
Recently, two processes have been proposed which have successfully solved two problems in motion vector estimation: motion vector discontinuity and occlusion. The first of these is these processes is the "line process" described in a paper by J. Konrad et al entitled "Bayesian Estimation of Motion=Vector Fields" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, pp 910-927 September 1992. The second process is the "occlusion process" described in a paper by R. Depommier et al entitled "Motion Estimation with Detection of Occlusion Areas" IEEE International Conference on Acoustics and Speech Signal Processing, pp. III 269-272, 1992. Although successful, these processes increase substantially the number of unknowns that need to be estimated and also introduce other parameters particular to the line and/or occlusion processes.
Global formulations over the complete motion field have been proposed to deal with this deficiency of the block matching techniques. One such formulation is proposed by B. Horn et al. in a paper entitled "Determining Optical Flow" Artificial Intelligence, vol. 17, pp 185-203, 1981. According to this proposal, motion vectors are estimated by minimizing the error of the motion constraint equation and the error of motion smoothness over the entire image. In this formulation, the motion constraint equation is derived from the assumption that the image intensity is constant along the motion trajectory. Any departure from this assumed smooth motion is measured as the square of the magnitude of the gradient of motion vectors. While this approach improves the handling of general types of motion, such as elastic motion, it tends to blur the motion vector fields at places where the motion is not continuous (i.e. at motion boundaries).
In a paper by E. Hilderith, entitled "Computations Underlying the Measurement of Visual Motion," Artificial Intelligence, vol. 23 pp 309-354, 1984, a partial solution to the problem of handling motion boundaries is proposed. According to this proposal, the motion vector field is assumed to be smooth only along a contour but not across it. This proposal overcomes the blurring problem. Because, however, motion vectors at points not lying along contours cannot be obtained, this technique cannot propagate motion information across contours, such as those due to textures, which do not correspond to motion boundaries. These types of contours are common in real-world images.
As described above, a technique which combines the line process along with Markov random field modeling and stochastic relaxation has been proposed by S. Genman et al. in a paper entitled "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 6, pp 721-741, November 1984, the described technique was used for restoring degraded images. In this context, a line process is a boolean field to mark the image intensity boundaries. Other researchers have adapted this idea to overcome the blurring problem of an estimated motion vector field by modifying the line process to indicate motion boundaries. An example of this technique is contained the above referenced paper by J. Konrad et al. One drawback of this method is that one additional unknown must be introduced for every two adjoining pixels in order to implement the line process. These additional unknowns greatly increase the computational overhead of any algorithm which employs this method.
Occlusion, by definition, means that part of the image cannot find a matching part in another image which corresponds to the same part of the scene. That part of the image was occluded from one image frame to the next. Occlusion appears quite often in real-world images when, for example, one object moves in front of another object, an object moves toward the camera, or objects rotate. If only two frames are used, it is difficult to obtain a good estimate of motions with occlusion because, for at least some parts of one image, there is no corresponding image part in the other image.
One simple solution to this problem is to use three image frames, a target frame and the frames occurring immediately before and immediately after the target frame. In most cases of real-world images, a matching portion for image parts in the middle frame can be found in either the preceding or succeeding frame. The above referenced paper by Depommier et al. proposes a combination of the line process, as set forth in the Konrad et al. paper with an occlusion process to detect occlusion areas using three frames. One drawback of this combination, however, is that it requires even more unknowns and parameters to produce the model than the line process alone.