An area of challenge within the computer vision community is effectively modeling an object's appearance when the object undergoes rigid or non-rigid motion. Numerous methods have been developed to handle rigid object motion, and algorithms have recently been proposed for non-rigid motion. Affine transformations evidently have been successfully applied to model global rigid object motion. However, it is well known that affine transformation is not effective when the modeled object undergoes non-rigid motion such as facial expressions and movements of human bodies. Such motion often carries important information for vision applications. Thus it is important to develop an effective model for non-rigid object deformation, as well as an efficient algorithm for recovering deformation parameters directly from image sequences.
Conventional approaches to modeling non-rigid object motion from visual cues span contour, appearance, and optical flow. Contour-based methods mainly use edge information near or on the boundary of the target object, and seek a transformation that matches object shapes by minimizing an energy function. Algorithms such as snakes, active shape modeling, active contour with condensation and geodesic contours have been applied to model non-rigid objects. These methods differ in the approach to shape representation, including energy functions defined in terms of edges and curvature, statistical distributions of points on a contour, curvature normals, and energy functions defined by level sets. Gradient descent algorithms are then applied to minimize energy functions or to fit the model to the image. Alternatively, factored sampling based methods can be applied to best match an image observation to a model. A drawback shared by the abovementioned methods is that initialization of points or edges on or near the object contour is useful for success. Furthermore, since these methods utilize only the contour of an object for matching, they ignore the abundant and rich texture information available. Descriptions of these concepts can be found in T. Cootes, et al., Active Shape Models—Their Training and Application, Computer Vision and Image Understanding, 1995; M. Isard and A. Blake, Contour Tracking by Stochastic Propagation of Conditional Density, Proceedings of the Fourth European Conference on Computer Vision, LNCS 1064, Springer Verlag, 1996; M. Kass, et al., Snakes: Active Contour Models, International Journal of Computer Vision, 1(4), 1987; and N. Paragios and R. Deriche, Geodesic Active Contours and Level Sets for the Detection and Tracking of Moving Objects, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3), 2000, the contents of which are incorporated by reference herein in their entirety.
Appearance-based methods utilize the texture information of an object for estimating non-rigid deformation. Existing algorithms resort to deformable templates with fiducial points, local parametric image patches, texture within triangulated mesh, elastic bunch graph matching, or a combination of shape and texture. Image warp is obtained by minimizing the sum of squared difference-of-pixel values between the template and an observed image, where affine transformation, graph matching, or an eight-parameter projective model is employed. However, the templates are not updated over time, resulting in problems when the imaging conditions (such as lighting and view angle) differ significantly from those of the stored template. Furthermore, with these approaches, fiducial points or local patches should be manually labeled prior to shape matching. Descriptions of these concepts can be found in M. Black and Y. Yacoob, Tracking and Recognizing Rigid and Non-Rigid Facial Motions Using Local Parametric Models of Image Motion, Proceedings of the Fifth IEEE International Conference on Computer Vision, 1995; T. Cootes, et al., Active Appearance Models, Proceedings of the Fifth European Conference on Computer Vision, volume 2, 1998; T. Cootes, et al., Active Appearance Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001; F. De La Torre, et al., A Probabilistic Framework for Rigid and Non-Rigid Appearance Based Tracking and Recognition, Proceedings of the Fourth International Conference on Automatic Face and Gesture Recognition, 2000; P. L. Hallinan, et al., Two- and Three-Dimensional Patterns of the Face, A. K. Peters, 1998; A. Lanitis, et al., Automatic Interpretation and Coding of Face Images Using Flexible Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 1997, S. Sclaroff and J. Isidoro, Active Blobs, Proceedings of the Sixth IEEE International Conference on Computer Vision, 1998; and L. Wiskott, et al., Face Recognition by Elastic Bunch Graph Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 1997, the contents of which are incorporated by reference herein in their entirety.
Weiss developed an Expectation-Maximization (EM)-based motion segmentation method by fitting a mixture of smooth flow fields to the spatio-temporal image data. By exploring low rank constraints of the optic flow matrix, Bregler et al. proposed an algorithm for non-rigid object tracking. Although these methods have successfully modeled nonrigid motion using feature points, optical flow estimation is usually sensitive to illumination change, occlusion and noise, thereby limiting the effectiveness of these methods for non-rigid object tracking. Descriptions of these concepts can be found in C. Bregler, et al., Recovering Non-rigid 3D Shape from Image Streams, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000; M. Irani, Multi-frame Optical Flow Estimation Using Subspace Constraints, Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999; and Y. Weiss, Smoothness in Layers: Motion Segmentation Using Nonparametric Mixture Estimation, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1997, the contents of which are incorporated by reference herein in their entirety.
The use of thin plate spline (TPS) warp for mapping points between two image frames based on their correspondence was first advocated by Bookstein. TPS warp has been used in image alignment and shape matching. Given a set of n corresponding 2D points, the TPS warp is described by 2(n+3) parameters, which include 6 global affine motion parameters and 2n coefficients for correspondences of control points. These parameters are computed by solving a linear system. One attractive feature of TPS warp is that it consists of affine and non-affine warping transformations, thereby allowing it to capture global rigid and local non-rigid motions. Consequently, TPS has been applied to shape matching, such as in medical imaging, for estimating non-rigid motion. A description of this can be found in F. Bookstein, Principal Warps: Thin-plate Splines and the Decomposition of Deformations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6), 1989, the contents of which are incorporated by reference herein in their entirety.
Chui and Rangarajan presented a method that simultaneously estimates the correspondence of points and non-rigid TPS warps by resorting to the EM algorithm. A description of this can be found in H. Chui and A. Rangarajan, A New Algorithm for Non-Rigid Point Matching, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, volume 2, 2000, the contents of which are incorporated by reference herein in their entirety.
Conventional methods of recovering the motion of a non-rigid object undergoing shape deformation and pose variation from its appearance thus rely on a sparse set of point correspondences and calculation of TPS parameters. A significant drawback to such methods is the prohibitive expensive of computations for real time applications such as robotics. Thus, there is a need for a more efficient method for modeling the motion of non-rigid objects.