1. Field of the Invention
The present invention relates generally to the field of computer vision. More specifically, the present invention relates to a system and method for detecting features, e.g., facial features, in images, and tracking such features over time in a series of images.
2. Related Art
A subject of interest in the computer vision field relates to the identification and/or tracking of deformable shapes or objects across a series of images. This subject has many applications in biometrics, facial expression analysis, and synthesis. Accurate reconstruction and tracking of deformable objects in images requires well-defined delineation of the object boundaries across multiple viewpoints.
Landmark-based deformable models, such as Active Shape Models (ASMs), allow for object shape detection and delineation in 2-dimensional (2D) images. ASMs are statistical models of the shapes of objects which are iteratively deformed to fit an example of an object in an image. Deformation of the models is limited to shapes provided in a training set of examples. The shape of the object is represented by a set of points, and the goal of the ASM algorithm is to match the model to an input image. ASMs detect features in an image by combining shape and appearance information from the observed image data, and uses a learned statistical shape distribution for a given class of objects to align the contours of shapes to the detected features in the observed images.
A major limitation of landmark-based deformable models is that they ignore the non-linear geometry of shape manifolds of objects in images viewed from multiple viewpoints, which severely limits the ability of such models to detect and track features across a series of images where there is substantial movement of the features. Such changes can result from movement (e.g., rotation) of the subject and/or the imaging apparatus, which can cause aspect changes in the images. Movement of 3-dimensional (3D) objects causes shapes to vary non-linearly on a hyper-spherical manifold. As a result, during tracking, the shape change is mostly smooth, but in certain cases there may be discontinuities. For example, during rotation of a subject's head to the full facial profile, some of the facial features may be occluded, causing drastic changes in shapes. In addition to shape changes, the correspondences between local 2D structures in an image and the 3D object structures changes for the landmark-based deformable models. The local grey level profiles at these landmarks also exhibit dramatic variations. Further, face shape variations across multiple aspects is different across human subjects. For example, a 30 degree head rotation can produce more distinct variations for faces with raised facial features (e.g., eyes and nose) versus faces with leveled features.
There have been several efforts in the past to represent non-linear shape variations using kernel Principal Component Analysis (PCA) and multi-layer perception. The results from non-linear approaches largely depend on whether all of the shape variations have been adequately represented in the training data. Discontinuities in the shape space may cause these models to generate implausible shapes. Kernel methods suffer from a major drawback in that they must learn pre-image functions for mapping shapes in the feature space to the original space, which is time consuming. Other techniques, such as Active Appearance Models (AAMs) and non-linear projections into eigenspaces, cannot adequately track features in images where the features move across a series of images.
Accordingly, what would be desirable, but has not yet been provided, is a system and method for detecting and tracking features in images, wherein moving features can be accurately detected in the images and tracked over time in a series of images.