The appearance of an object can be represented by statistical models trained using a set of annotated image examples. This is thus highly dependent on the way in which the model is trained. A new image can be interpreted by finding the best plausible match of the model to the image data. While there has been a great deal of literature in computer vision detailing methods for handling statistical models for human faces, there still exist some problems wherein solutions are desired. For example, statistical models for human faces are sensitive to illumination changes, especially if lighting in the test image differs significantly from conditions learned from a training set. The appearance of a face can change dramatically as lighting conditions change. Due to the 3D aspect of the face, a direct lighting source can cast strong shadows and shading which affect certain facial features. Variations due to illumination changes can be even greater than variations between the faces of two different individuals.
Various methods have been proposed to overcome this challenge. A feature-based approach seeks to utilize features that are invariant to lighting variations. In C. Hu, R. Feris, and M. Turk, “Active wavelet networks for face alignment,” in Proc. of the British Machine vision Conference, East Eaglia, Norwich, UK, 2003, incorporated by reference, it is proposed to replace the AAM texture by an active wavelet network for face alignment, while in S. Le Gallou, G. Breton, C. Garcia, and R. S'eguier, “Distance maps: A robust illumination preprocessing for active appearance models,” in VISAPP '06, First International Conference on Computer Vision Theory and Applications, Set'ubal, Portugal, 2006, vol. 2, pp. 35-40, incorporated by reference, texture is replaced by distance maps that are robust against lighting variations.
Other methods rely on removing illumination components using lighting models. The linear subspace approaches of S. Z. Li, R. Xiao, Z. Y. Li, and H. J. Zhang, “Nonlinear mapping of multi-view face patterns to a Gaussian distribution in a low dimensional space,” in RATFG-RTS '01: Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001, p. 47, and M. Bichsel, “Illumination invariant object recognition,” in ICIP '95: Proceedings of the 1995 International Conference on Image Processing—VoL 3, 1995, p. 3620, and P. N. Belhumeur, J. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, 1997, which are each incorporated by reference, approximate the human face surface with a Lambertian surface and compute a basis for a 3D illumination subspace, using images acquired under different lighting conditions.
The illumination convex cone goes a step further with the model, taking into account shadows and multiple lighting sources, as in P. N. Belhumeur and D. J. Kriegman, “What is the set of images of an object under all possible lighting conditions?,” in CVPR '96: Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition, 1996, p. 270, and A. S. Georghiades, D. J. Kriegman, and P. N. Belhumeur, “Illumination cones for recognition under variable lighting: Faces,” in CVPR '98: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1998, p. 52, and A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Generative models for recognition under variable pose and illumination,” in FG, 2000, pp. 277-284, which are each incorporated by reference.
More complex models have been proposed like the geodesic illumination basis model of R. Ishiyama and S. Sakamoto, “Geodesic illumination basis: Compensating for illumination variations in any pose for face recognition,” in ICPR (4), 2002, pp. 297-301, incorporated by reference, or the 3D linear subspace model that segments the images into regions with directions of surface normals close to each other as in A. U. Batur and M. H. Hayes, “Linear subspaces for illumination robust face recognition,” in CVPR (2), 2001, pp. 296-301, incorporated by reference.
The canonical form approach appears as an alternative, where an attempt to normalize variations in appearance by image transformations or by synthesizing a new image from the given image in a normalized form is undertaken. Recognition is then performed using this canonical form as in W. Zhao, Robust image based 3d face recognition, Ph.D. thesis, 1999, Chair-Rama Chellappa. [12] W. Gao, S. Shan, X. Chai, and X. Fu, “Virtual face image generation for illumination and pose insensitive face recognition,” ICME, vol. 3, pp. 149-152, 2003, incorporated by reference.
In T. Shakunaga and K. Shigenari, “Decomposed eigenface for face recognition under various lighting conditions,” CVPR, vol. 01, pp. 864, 2001, and T. Shakunaga, F. Sakaue, and K. Shigenari, “Robust face recognition by combining projection-based image correction and decomposed eigenface,” 2004, pp. 241-247, which are incorporated by reference, decomposition of an eigenface into two orthogonal eigenspaces is proposed for realizing a general face recognition technique, under lighting changes. A somewhat similar approach is used in J. M. Buenaposada, E. Munoz, and L. Baumela, “Efficiently estimating facial expression and illumination in appearance-based tracking,” 2006, p. 1:57, incorporated by reference, for face tracking, where the face is represented by the addition of two approximately independent subspaces to describe facial expressions and illumination, respectively.
In N. Costen, T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Automatic extraction of the face identity-subspace,” in BMVC, 1999, and N. Costen, T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Simultaneous extraction of functional face subspaces,” CVPR, vol. 01, pp. 1492, 1999, which are incorporated by reference, facial appearance models of shape and texture are employed and non-orthogonal texture subspaces for lighting, pose, identity, and expression are extracted using appropriate image sets. An iterative expectation-maximization algorithm is then applied in order to maximize the efficiency of facial representation over the added subspaces. The projections on each subspace are then used to recalculate the subspaces. This approach is shown to improve the identity recognition results. It is still desired to have an algorithm that permits less complex handling of illumination changes, and obtaining a general and robust facial appearance model.
PCA-based models generally do not decouple different types of variations. AAM techniques are using PCA, and thus inherit this limitation of being practically incapable of differentiating among various causes of face variability, both in shape and texture. An important drawback of a non-decoupled PCA-based model is that it can introduce non-valid space regions, allowing the generation of non-realistic shape/texture configurations. Moreover, the interpretation of the parameters of the global model can be ambiguous, as there is no clear distinction of the kind of variation they stand for. It is recognized by the inventors that it would be desirable to obtain specialized subspaces, such as an identity subspace and/or a directional lighting subspace.
Changes in lighting or illumination represent one of the most complex and difficult to analyze sources of face variability. Thus it is desired to decouple variations in identity from those caused by directional lighting. It is further desired to split the shape model by decoupling identity from pose or expression. It is recognized by the inventors that decoupling the pose variations from the global shape model can be realized by using a proper training set, in which the individuals are presented in several poses, normally covering a range within 30°-40° for head tilting.