The performance of a face recognition system is adversely affected by changes in facial appearance due to lighting and pose variation. One prevalent trend is to exploit 3D shape information of human faces to overcome the limitation of traditional 2D images. The 3D shape information can be obtained directly from a range scanner or estimated from one or more images. Although the cost of acquiring 3D geometric data is decreasing, most existing face databases only include single 2D images. Therefore, it is more practical to obtain 3D shape from a single 2D image than from multiple image or range data.
Currently, there are three different techniques that use 3D shape information for face recognition. First, using 3D shape directly as a pose/illumination independent signature. Second, using 3D data to generate synthetic imagery under various viewpoints and lighting conditions in order to generate a pose/illumination invariant representation in 2D image space. Third, using 3D shape to derive an analytic illumination subspace of a Lambertian object with spherical harmonics.
For example, the first approach is typified by Morphable Models, V. Blanz and T. Vetter, “Face recognition based on fitting a 3D morphable model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1063-1074, 2003. They obtain the 3D shape and 2D texture of a face from a single image to construct a model. The models for a probe and a gallery image are matched directly based on their respective principle component analysis (PCA) coefficients. That technique handles variable pose and lighting. However, the technique requires careful manual initialization of facial landmarks and uses an iterative non-linear optimization technique for fitting, which can take several minutes to converge, if at all, and then only to a local minimum. Thus, it is not certain whether that face capture/modeling approach can be used for real-time face recognition.
The second and third techniques are qualitatively different, and are related to a popular recognition paradigm of “distance-from-a-subspace” which dates back to early work on 2D appearance-based modeling. Although those two approaches can also use 3D morphable models, it is mostly in the form of a tool for subsequent invariant modeling and subspace generation, as opposed to the final choice of representation for recognition.
Several methods are known for generating a linear subspace to acquire the illumination variations of a face. One method uses photometric stereo images to reconstruct 3D face geometry and albedo from seven frontal images under different illuminations, A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):643-660, 2001. The estimated 3D face can then be used to render synthetic images from various poses and lighting conditions to train a person specific illumination cone.
Another method uses a similar “short-cut”, R. Basri and D. Jacobs, “Lambertian reflectance and linear subspace,” IEEE Transaction on Pattern Analysis and Machine Intelligence, 25(2):218-233, 2003. They state that the arbitrary illumination of a convex Lambertian 3D object should be approximated by a low dimensional linear subspace spanned by nine harmonic images. The nine harmonic images can be determined analytically given surface normals and the albedo.
A more practical variation is decribed by K. Lee, J. Ho, and D. Kriegman, “Nine points of light: Acquiring subspaces for face recognition under variable lighting,” Proc. of Computer Vision & Pattern Recognition, volume 1, pages 519-526, 2001. They empirically determine nine directions of a point source with which to approximate the span of the nine harmonic images. These nine images are adequate for face recognition, and do not require 3D shape, e.g., surface normals and albedo. However, it is not always practical to acquire nine images of every face in a real operational setting.
Another method estimates the nine harmonic images from a single image, L. Zhang and D. Samaras, “Face recognition under variable lighting using harmonic image exemplars,” Proc. Computer Vision & Pattern Recognition, pages I:19-25, 2003. However, the face is neither exactly Lambertian nor entirely convex. Therefore, spherical harmonics have an inherent limitation, especially when dealing with specularities, cast shadows, inter-reflections and subsurface scattering. They also require a ‘bootstrap’ dataset.