In many computer vision systems, and specifically in face identification systems, it is very difficult to design systems that are invariant to arbitrary lighting. Indeed, large independent U.S. government tests have concluded that the identification of faces in images acquired with arbitrary lighting fail to achieve the success rate of faces in images acquired with controlled lighting, see Phillips, “Face Recognition Vendor Test (FRVT) 2002 report,” Technical report, National Institute of Standards and Technology, March 2003, and Phillips et al., “The FERET evaluation methodology for face-recognition algorithms,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(10), pp. 1090-1104, 2000.
The present invention provides a solution for the difficult but routine problem of facial identification as applied to access control and surveillance applications.
In such applications, an image is acquired of a face of a possibly unknown individual in an arbitrary scene under arbitrary illumination. A camera at a fixed pose, e.g., frontal, acquires the image. The camera is uncalibrated and has unknown intrinsic parameters. The image can obtained from a video, archival photography, web imagery, family photo albums, identification photograph, and the like.
Without any 3D measurement of the individual or the scene, the problem is to match the face in the single image to images of known individuals stored in a database. The stored images were acquired under fixed lighting, e.g., diffuse or frontal.
To solve this problem, all images need to be normalized geometrically and photometrically to provide a single fixed illumination template suitable for robust pattern matching and illumination invariant face identification. Naturally, the canonical choice of illumination would include non-directional or diffuse, or at least frontal, lighting that maximizes visibility of all key facial features.
Because the focus is on illumination-invariance, it is assumed that the geometric normalization is performed in a preprocessing step. The preprocessing can include detecting the location of the face in the image, detecting facial features, such as the eyes, rigid transforms, i.e., scale, rotation and translation, to align the detected features. It is also assumed that some simple photometric normalization may have already taken place, e.g., a non-spatial global transform, which is only a function of intensity, e.g., gain, contrast, and brightness.
Much of the prior art on modeling lighting has focused on finding a compact low-dimensional subspace to model all lighting variations. Under theoretical Lambertian assumption, the image set of an object under all possible lighting conditions forms a polyhedral ‘illumination cone’ in the image space, Belhumeur et al., “What is the set of images of an object under all possible lighting conditions,” Int'l J. Computer Vision, volume 28, pp. 245-260, 1998.
Subsequent work that applies the above theory to face recognition is described by Basri et al., “Lambertian reflectance and linear subspaces,” Int'l Conf. on Computer Vision, volume 2, pages 383-390, 2001. Basri et al. represent lighting using a spherical harmonic basis wherein the low dimensional linear subspace is shown to be effective for face recognition.
One method analytically determines the low dimensional subspace with spherical harmonics, Ramamoorthi, “Analytic PCA construction for theoretical analysis of lighting variability in images of a Lambertian object,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 24, Oct. 2002. Another method arranges lighting to best generate equivalent basis images for recognition, Lee et al., “Nine points of light: Acquiring subspaces for face recognition under variable lighting,” Proc. IEEE Conf. on Computer Vision & Pattern Recognition, pages 519-526, 2001.
A complementary approach is to generate a lighting invariant ‘signature’ image. Although that technique cannot deal with large illumination changes, it does have the advantage that only one image per object is required in the database.
Other prior art normalization techniques generate invariant templates by using histogram equalization or linear ramp subtraction, Rowley et al., “Neural network-based face detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(1), pp. 23-38, 1998.
It is known that the image gradient is illumination-insensitive and can be used in a probabilistic framework to determine the likelihood that two images were acquired from the same object, Chen et al., “In search of illumination invariants,” Proc. IEEE Conf. on Computer Vision & Pattern Recognition, pages 1-8, 2000.
The near symmetry of faces can be used to determine an illumination invariant prototype image for an individual without recovering albedos, Zhao et al., “Symmetric shape-from-shading using self-ratio image,” Int'l J. Computer Vision, 45(1), pp., 55-75, 2001.
Another method assumes that different faces have a common shape but different texture and determines an albedo ratio as an illumination-invariant signature, Shashua et al., “The quotient image: Class-based rerendering and recognition with varying illuminations” IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(2), pp. 129-139, 2001.
Object relighting methods have also been described for computer graphic applications. One application uses corneal imaging for embedding realistic virtual objects, e.g., faces, into a scene, resulting in synthetic faces that are properly ‘relit’ in accordance with estimated environmental lighting, Nishino et al., “Eyes for relighting,” Proceedings of SIGGRAPH, 2004.
Another method uses a radiance environment map, Wen et al., “Face relighting with radiance environment maps,” Proc. IEEE Conf. on Computer Vision & Pattern Recognition, 2003. That method renders relatively high quality faces using the spherical harmonics, Rammamoorthi et al., “A signal processing framework for inverse rendering,” Proceedings of SIGGRAPH, 2001.
However, for face identification there is no need for high-quality rendering or photorealism. In fact, most known 2D face identification systems operate at low to moderate resolutions, e.g., ˜100 pixels across the face.