Motion estimation is a technique for finding corresponding projection points of the same spatial physical point on a number of image frames. It is a fundamental problem for a wide variety of tasks in digital video processing and vision technologies, such as image alignment and registration, motion segmentation, video filtering, video enhancement, motion compensation for video compression, image based rendering, and 3-D shape recovery.
After years of effort, it still remains as a challenging task, mainly because it is usually an under-constrained and ill-posed problem. Many important factors involved in image formation, such as illumination, object characteristics (shape and texture) and camera projection, are lost during the imaging process. Thus it is desirable to explore valid and consistent constraints, study their applicable domains, and use the right method for a specific task. The more valid and consistent the constraints, the less the ambiguity in the solutions, and the better the approximation to the true motion field.
A large number of schemes have been proposed so far, varying mainly according to the available information, the underlying modeling assumptions, the performance and speed tradeoff, and the implementation details.
Under the assumption of temporally uniform illumination and Lambertian scene surfaces, the projections of the same spatial point on the image frames have the same appearance (intensity or color), and an optical flow equation can be derived for each point. The optical flow equation governs the relationship between the pointwise displacement and the spatial and temporal intensity gradients. This constraint has been applied to motion estimation between two images, and a large number of schemes have been proposed. Selected computational methods can be found in the following book and review papers, Optic Flow Computation: A Unified Perspective by A. Singh, IEEE Computer Society Press, 1992, “On the computation of motion from sequences of images—A review” by J. K. Aggarwal et al., Proceedings of the IEEE, 76:917–935, 1988. “Performance of Optical Flow Techniques” by J. L. Barron, et al., International Journal of Computer Vision (IJCV), 2(1): 43–77, 1994, and “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms” by D. Scharstein, et al., IJCV, 47(1/2/3): 7–42, 2002. However, only one equation is available for both the x and y coordinates at each point, which in turn induces the aperture problem as only the normal components of the motion field can be recovered.
One way to overcome the under-constrained problem is to assume smooth spatial flow. Depending on the choice of support region, motion model, and the way to enforce smoothness, various methods have been proposed. “An iterative image registration technique with an application to stereo vision” by B. Lucas, et al., in Image Understanding Workshop, pp. 121–130, 1981, used a constant-brightness constraint over a local 2-D block. A global 2-D affine model (more general than 2-D block motion) was disclosed in “A three-frame algorithm for estimating two-component image motion” by J. Bergen, et al., IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 14: 886–896, 1992. Regularization was used to enforce a smoothness constraint in “On the estimation of optical flow: relations between different approaches and some new results” by H. Nagel, et al., Artificial Intelligence, 33: 299–324, 1987. And motion between two images is estimated by minimizing the least squares difference between two images in “A computational framework and an algorithm for the measurement of structure from motion” by P. Anandan, IJCV, 2:283–310, 1989. However, the smoothness constraint may introduce motion blurring and the underlying (usually 2-D) motion model is not always valid.
Another alternative is to recover the epipolar geometry between two frames and match the corresponding point with same intensity on the epipolar line. However, the point-line epipolar constraint is not tight enough to decide the point-point correspondence, especially when there is not enough texture available (e.g., in uniform regions).
A number of patents have been awarded in this area. U.S. Pat. No. 5,241,608, “Method for estimating velocity vector fields from a time-varying image sequence” to S. V. Fogel, disclosed a regularization method for motion estimation, using the optical flow constraint and smoothness constraint. U.S. Pat. No. 5,680,487, “System and method for determining optical flow” to V. Markandey, disclosed a hierarchical multi-resolution flow estimation between two images based on the optical flow equation. U.S. Pat. No. 6,219,462, “Method and apparatus for performing global image alignment using any local match measure” to P. Anandan and M. Irani, disclosed a method for performing parametric image alignment by applying global estimation directly to the local match-measure data. U.S. Pat. No. 6,507,661, “Method for estimating optical flow” to S. Roy, disclosed a method to estimate the optical flow between a plurality of images. In addition, the plane-plus-parallax approach for 3-D motion estimation was disclosed in U.S. Pat. No. 5,963,664, “Method and system for image combination using a parallax-based technique” to R. Kumar et al., and U.S. Pat. No. 6,192,145, “Method and apparatus for three-dimensional scene processing using parallax geometry of pairs of points” to P. Anandan et al.
However, most of the prior art uses two image frames at a time, and enforces the constant-brightness constraint with the smooth flow assumption, which may introduce an aperture problem and a motion blurring problem depending on the image content. The estimated field is usually the optical (apparent) flow, not necessarily the true motion field. So the corresponding points have the same intensity/color on the image frames, but do not necessarily correspond to the same spatial point. Therefore there is a need to enforce a geometric constraint as well, and use the cues of geometry and appearance together to find the true motion.
One fundamental geometric constraint is the trilinear constraint across three images. U.S. Pat. No. 5,821,943, “Apparatus and method for recreating and manipulating a 3D object based on a 2D projection thereof to A. Shashua, disclosed a method to generate information regarding a 3D object from at least one 2D projection of the 3D object by the use of a trifocal tensor. Meanwhile, studies on the use of trifocal representation were also published in scientific and engineering journals. In particular, the trilinear equations first appeared in “Algebraic functions for recognition” by A. Shashua, in IEEE Transactions on Pattern Analysis and Machine Intelligence, 17:779–789, 1995. The notion of trifocal tensor and other geometric relations across three images were presented in “Structure from motion using line correspondences” by M. E. Spetsakis, et al., IJCV, 4: 171–183, 1990, and “Lines and points in three views and the trifocal tensor” by R. I. Hartley, in IJCV, 22:125–140, 1997. And more comprehensive materials on this subject can be found in a book, Multiple View Geometry in Computer Vision by R. Harley and A. Zisserman, Cambridge University Press, 2001. The application of the trifocal model in motion estimation and compensation, and video compression and manipulation was studied in “Trifocal motion modeling for object-based video compression and manipulation” by Z. Sun and A. M. Tekalp, appeared in IEEE Trans. On Circuits and Systems for Video Technology, 8:667–685, 1998.
What is needed is the use of multiple, but appropriate, constraints in a single optimization to solve for the motion fields between multiple images. In this connection, it is noteworthy that the trilinear constraint has not been used before with the constant-brightness constraint together in a single optimization to solve for the motion fields between three images.