In motion estimation, also referred to as optical-flow estimation and displacement estimation, the correspondences between areas in different video frames, also referred to as images, in a video sequence may be determined. The motion of objects in the actual scene captured in the video sequence, in addition to camera motion, may result in moving visual patterns in the video frames. A goal of true motion estimation may be to estimate the two-dimensional (2D) motion of a visual pattern from one frame to another such that the estimated 2D motion may be the projection of the actual three-dimensional (3D) scene motion. Local motion estimation may refer to estimation of a motion vector for a small image area, for example, a single pixel or a small block of pixels. Exemplary small blocks may be 2×2, 4×4, 8×8 and other blocks containing a small number of pixels. A set of motion vectors for all pixels, or pixel blocks, across an entire image, or video frame, may be referred to as a motion field. The estimated motion field may be used in applications in many areas, for example, video processing, video coding, computer vision and other video and imaging areas. Exemplary applications may include motion-compensated video coding, motion-compensated video filtering and motion-compensated frame interpolation.
Gradient-based motion estimation may be one important class of motion estimation methods. In gradient-based motion estimation, local motion may be modeled as substantially constant in a neighborhood proximate to a pixel location where a motion vector may be estimated. The neighborhood may be referred to as a local analysis window, analysis window or window. Spatial and temporal derivative values, also referred to as spatio-temporal gradients, of the pixel data in the window may be determined and used to compute a motion vector, a displacement vector or other parameters corresponding to the associated motion.
Another important class of motion estimation methods may be block matching. In block matching, a block of pixels in one frame may be matched to a block of pixels in another frame by searching, in a pre-defined region, among candidate blocks of pixels.
Assumptions used in most motion models may not hold at all image locations, thereby making motion estimation a very challenging problem, both in theory and in practice. For example, an often assumed basic assumption that the color, the intensity or the brightness of a pixel, or block of pixels, is preserved from one video frame to the next may not hold due to the 3D nature of the actual objects in the scene and their associated illumination. Additionally, a reliable solution may not be accessible when the data may not sufficiently constrain the motion model, for example, when the color or intensity function may be locally very flat or one-dimensional, a problem referred to as the aperture problem. Furthermore, areas in one image may not appear in another image due to occlusions, for example, background areas that may be covered or uncovered by a moving foreground object.
The potential presence of multiple objects within an analysis window may generate problems with a gradient-based motion estimation approach, wherein local motion may be modeled to be substantially constant in a neighborhood, due to the possibility of each of the multiple objects being associated with differing motion within the captured scene. The presence of multiple motions within the analysis window may lead to inaccurate estimates of the motion vector, or other motion parameters, being estimated.
Additionally, the data within an analysis window may comprise one or more noise components due to, for example, camera noise, compression noise or other noise. The noisy data within an analysis window may lead to inaccurate motion vector, or other motion parameter, estimates. This problem may be especially apparent when the analysis window is not sufficiently large enough to ensure accurate motion estimation.
Regularization may be applied to the estimation of motion-vector fields in an attempt to mitigate for the above-described and other problems. Regularization may use an assumption, for example, that of spatial coherency, to constrain the estimation. The concept of spatial coherency states that real-world surfaces have a spatial extent and areas on a single surface are likely to moving with the same or very similar motion. The spatial coherency concept leads to the introduction of a motion smoothness constraint. However, the assumption of spatial coherency does not hold at motion boundaries, which often coincide with object boundaries, and may lead to motion fields that are too smooth, especially at motion boundaries.
Computationally efficient systems and methods for motion estimation that improve spatial coherency of a motion field without over-smoothing the motion field at a motion boundary and that do not require explicit occlusion detection may be desirable.