A three dimensional object can be represented in two dimensions. In fact, representing a three-dimensional object by using two-dimensional views has advantages in object modeling and synthesis. In such two-dimensional representations, the three-dimensional features of the object need not be explicitly recovered and this avoids difficulties in three-dimension based methods. Rather, it is conventional to use view-based models to represent the object with multiple two-dimensional view projections. When representing an object with more than one two-dimensional view, a pixel-wise correspondence map is usually required between each of the two-dimensional views. Alternatively, a sparse correspondence map between a small set of feature points, or edges of features, on the object can be used between each of the two-dimensional views. The correspondence map can be computed and applied to separate the shape of the object from the texture of the object. As such, both the shape of the object and the texture of the object, from a particular viewpoint, can be modeled in a linear subspace.
When representing an object with more than one two-dimensional view, it may be advantageous to establish points to represent features on the object (feature points). A feature point based correspondence between multiple two-dimensional views is advantageous in some applications because it is more robust in dealing with light intensity and color variations, and can involve less computation than establishing a dense representation of the object. Accordingly, to accurately model an image class of an object, two problems can be addressed and solved. The first problem is that of locating feature points on features of the object by using a training set of two-dimensional views. Once the feature points of the features on the object are located, the second problem is that of establishing the underlying correspondence between two or more sets of feature points from a corresponding number of two-dimensional views.
An additional degree of difficulty arises when locating features in a complex and non-rigid object using multiple two-dimensional views. These difficulties can be addressed by using prior knowledge regarding the object itself. For instance, a deformable model of the object is an example of such prior knowledge. The deformable model can provide constraints on the location of features on the object. These constraints, which can be derived from image data with respect to the object, can be used to deal with the problems like segmentation of the object or the detection of features on the object. In particular, the Active Shape Model (ASM) as proposed by Cootes et al. (Active Shape Model, T. F. Cootes, C. Taylor, D. Cooper, and J. Graham. Active shape models—their training and their applications. Computer Vision and Image Understanding, 61(1):38–59, January 1995) provided the advantage that the instances of the model of the object can be deformed only in those ways that were learned from the training set from which the model was derived. That is, the model can accommodate considerable variability in segmentation of the object, or in the detection of its features, but the model should still be specific to the class of the object that the model represents. ASM uses the Principle Component Analysis technique (PCA) to model an object by both the two-dimensional shape variations of the object and the local grey level structures of the object. In particular, ASM is a process that includes interrogating two-dimensional images of the object and approximating the shape of features on the object using points (feature points) that respectively represent each feature on the object.
When two or more different two-dimensional views of the same object are given, the features on the object for each view can be matched to a model of the object by using ASM. After the model matching for each view is performed, it would be desirable to find the correspondence between the respective feature points of the features of the object for the different views, such as by implication across each of the different views. The detected feature points for each feature on the object, however, may not be geometrically consistent across the different views. This inconsistency occurs because ASM only considers a single view rather than the correspondence between the different views. The matching of the model to each of the different views could benefit from the use of a multi-view geometry. While this can be accomplished, it requires that all key feature points of each feature on the object remain visible in each of the two-dimensional views.
One technique for using ASM involves using a set of training examples of one face view from different viewpoints. From these views of the one face, a set of feature points of facial features on the face can be manually labeled. This manual labeling represents a deformation of the face into its different facial features and respective points that make up the facial features. For instance, these features can include the nose, eyes and mouth. The feature points are those points that mark the facial features on the face. ASM uses the set of training data representing the deformation of the face to analyze facial features on a different face by using views of the different face. This conventional ASM technique, however, suffers from inaccuracy in locating facial features in the views of the face being analyzed. Moreover, the conventional ASM technique can only deform the face being analyzed in the ways that the face in the set of training data had been deformed. One partial solution to overcome the inherent inaccuracy is to use a larger training database. Such a solution is only partial because it does not take into consideration the local grey-level model fitting for the different views of the face. Local grey-level model fitting tends to interpret the data so as to move facial features toward the strongest photometric edge, which may not necessarily be the actual edge of a facial feature—thus introducing further inaccuracy. Moreover, using a larger training database may further decrease accuracy because the additional data tends to further extend the acceptable facial feature shapes into an inaccurate range for the face being analyzed.
In addition to the foregoing problems, ASM is not consistent in finding the same facial features in two slightly different views of the same face. ASM does not always guarantee that the features identified in training data for a training object will yield similar features when searching two slightly different views of another object. This inability can be attributed to illumination changes in the object as a result of rotating the object in different two-dimensional views, or it can be attributed to different initial parameters. Two negative results can occurs when using conventional ASM in this environment. One result is that the conventional ASM model will wrongly identify features on an object or will inaccuracy locate the feature points for features on the object. Either way, different two-dimensional views of the same object, when using conventional ASM, will yield features that do not match up between the different two-dimensional views. In consequence, the correspondence between the identified features of the different views is inaccurate when using conventional ASM.
If would be an advance in the art to develop a technique that will accurately and consistently identify the same features in different views of the same object.