Standard motion modeling for video coding involves parametric models, applied to a fixed region (motion block), to estimate the motion. These approaches are limited in that the models cannot handle the existence of multiple (different) motions within the motion block. This presents a problem.
A basic problem in motion estimation is the ability of the model to handle multiple motion and moving object boundaries. Standard motion models, such as the affine or perspective models, allow for smooth deformations of a region (i.e., the motion block) to capture a coherent motion (such as translation, zoom, rotation) for all the pixels in the motion block. The region or block over which the motion is estimated cannot be chosen to be to small; this is from (1) a coding point of view, since larger regions mean smaller motion overhead, and (2) from an estimation point of view, larger region allows for better estimation of motion parameters.
A key problem that arises, from the standard limitation of common motion models, is the occurrence of multiple motions within the motion block. A moving object boundary within a motion region is indication of two possibly very different motions (motion of the object and motion of say the background). Also, a moving object boundary implies that some pixels will be occluded (hidden) with respect to the past or future motion estimation. This occlusion effect can bias the motion estimate, lead to higher prediction error, and make it difficult to accurately extract the object boundary.
Approaches in motion segmentation often rely on optical flow estimates or parametric (i.e., affine) motion models; these will have the usual problems near object boundaries and occlusion effects. Some degree of smoothness in the segmentation field, and hence in object boundaries, can be achieved with a prior probability term in MAP/Bayesian methods. This is more of a constraint on the connectivity of the segmentation field, without any explicit coupled model to account for object boundary and motion fields. A curvature evolution model may be used to capture the boundary of a moving object. However, this approach does not involve motion estimations/field, and relies on a temporal difference operator in the model for the evolution of the object boundary.
In another approach, the context of a level set approach, implicitly models the contour of the object boundary and multiple affine motion fields, however, motion estimation is with respect to only one reference frame, i.e., motion of frame n is determined from n−1. As discussed above, this has problems. Some pixels close to the object boundary may be occluded; this will in turn bias the estimation of the boundary, since the motion field is not reliable near the boundary due to occlusion.
Thus, there are problems with the common motion models.