Face recognition is typically performed by comparing a ‘probe’ image of an unknown face with a ‘gallery’ of images of known faces. It is a problem to reliably recognize a face in a 2D image when there are variations in poses, camera viewpoints and illuminations in the probe and gallery images.
One prior art face recognition method uses multi-linear analysis for 2D facial images by applying a high order singular value decomposition (SVD) to 2D facial images under multiple factors, such as identity, expression, pose, and illumination, M. A. O. Vasilescu and D. Terzopoulos, “Multilinear Subspace Analysis of Image Ensembles,” Proceedings of Computer Vision and Pattern Recognition, 2003. Because that method does not consider 3D shape information of faces, that has a reduced reliability when there are variations in pose and illumination directions that cast shadow due to the fact that the face not entirely spherical and convex, but also includes numerous concavities and protuberances.
Three-dimensional information about the shape of a face can be used to reduce this problem. The 3D shape information can be obtained directly from a range scanner or estimated from one or more images. Shape information can also be used to generate a synthetic image that is invariant of pose and illumination. Alternative, the 3D shape can be used to derive an analytic illumination subspace of a Lambertian object with spherical harmonics.
Various methods that use shape information are described by R. Basri and D. Jacobs, “Lambertian Reflectance and Linear Subspace,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 25, no. 2, pp. 218-233, 2003; C. Beumier and M. Acheroy, “Automatic 3D Face Authentication,” Image and Vision Computing, vol. 18, no. 4, pp. 315-321, 2000; V. Blanz and T. Vetter, “Face Recognition based on fitting a 3D morphable model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1063-1074, 2003; Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel, “Expression-Invariant 3D Face Recognition,” Proc. of the 4th Int. Conf. on Audio- and Video-Based Biometric Person Authentication, 2003, pp. 62.69; Kyong I. Chang, Kevin Bowyer, and Patrick Flynn, “Face Recognition Using 2D and 3D Facial Data,” in Multimodal User Authentication Workshop, 2003; A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643-660, 2001; K. Lee, J. Ho, and D. Kriegman, “Nine Points of Light: Acquiring Subspaces for Face Recognition under Variable Lighting,” Proceedings of Computer Vision and Pattern Recognition, 2001, vol. 1, pp.519-526; S. Romdhani, V. Blanz, and T. Vetter, “Face Identification by Fitting a 3D Morphable Model using Linear Shape and Texture Error Functions,” European Conference on Computer Vision, 2002, pp.3-19, 2002; J. Huang, B. Heisele, and V. Blanz, “Component-based Face Recognition with 3D Morphable Models,” Proc. of the 4th Int. Conf. on Audio- and Video-Based Biometric Person Authenticitation, 2003; and Lei Zhang and Dimitris Samaras, “Face Recognition Under Variable Lighting using Harmonic Image Exemplars,” Proceedings of Computer Vision and Pattern Recognition, pp.19-25, 2003.
Three-dimensional shape information can be used directly as a pose and illumination independent model. The direct method uses a morphable model to obtain the 3D shape and 2D texture of a face from a single image. The model of the probe image is then compared with the models of the gallery images based on principal component analysis (PCA) coefficients. However, the direct method requires manual initialization of fiducial landmarks on the face and uses a non-linear, iterative fitting procedure, which can take minutes to converge, if at all, and then only to a local minimum. Thus, for several reasons, the direct method is not suited for real time applications.
Face recognition methods are related to the recognition paradigm of ‘distance-from-a-subspace’, which is derived from 2D appearance-based modeling. Although those methods can also use 3D morphable models, the 3D models are essentially a post-processing tool for subsequent invariant modeling and subspace generation, as opposed to a model that is used for face recognition, as in the direct method.
Several methods are known for generating a linear subspace that represents variations in illumination of a face. One method uses photometric stereo images to reconstruct the 3D face shape and 2D albedo from seven frontal images under different illuminations. The estimated 3D shape is then used to render synthetic images for various poses and illumination to train a person-specific illumination cone, A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643-660, 2001. It is desired to recognize a face from a single image, and to eliminate the rendering and training steps.
Basri et al. approximate the arbitrary illumination of a convex Lambertian 3D object by a low-dimensional linear subspace spanned by nine harmonic images. The nine harmonic images can be determined analytically given surface normals and the albedo. However, that method also assumes analytical Lambertian illumination with spherical harmonics, which are known to be incorrect for faces.
Another method finds nine directions of a point light source with which to approximate the span of the nine harmonic images. That method does not require the 3D shape, i.e., surface normals and albedo. However, it is not always practical to acquire nine images of every face to be recognized, K. Lee, J. Ho, and D. Kriegman, “Nine Points of Light: Acquiring Subspaces for Face Recognition under Variable Lighting,” Proceedings of Computer Vision and Pattern Recognition, 2001, vol. 1, pp. 519-526, 2001.
It is desired to perform face recognition with a single probe image.
Another method estimates the nine harmonic images from a single image, Lei Zhang and Dimitris Samaras, “Face Recognition Under Variable Lighting using Harmonic Image Exemplars,” Proceedings of Computer Vision and Pattern Recognition, pp. 19-25, 2003. That method uses a 3D bootstrap set obtained from a 3D face database, Sudeep Sarkar, “USF HumanID 3-D Database,” University of South Florida, Tampa, Fla. Their method is also based on an analytic illumination subspace of a Lambertian objects with spherical harmonics.
However, any method based on a spherical harmonics has an inherent limitation, because faces are not entirely convex. Indeed, faces do not have exact Lambertian reflectance, which makes it difficult to deal with specularities, cast shadows, inter-reflections and subsurface scattering in the epidermal and dermal layers of the skin.
Therefore, it is desired to generate a bi-linear illumination model of a face directly from a single 2D image. Furthermore, it is desired to obtain a generic model for all types of faces, so faces can be recognized reliably. In addition, such a model would enable the rendering of synthetic facial images for arbitrary viewpoints and illumination, e.g., canonical basis images that are object-centered for greater flexibility. Furthermore, it is desired to have a model that is compact in storage and can be used in real time.