The most visually distinguishing feature of a person is the face. Therefore, face recognition in still and moving images (videos) is an important technology for many applications where it is desired to identify a person from images. Face recognition presents an extremely difficult challenge for computer vision technology.
For example, in facial images acquired by surveillance cameras, the lighting of a scene is often poor and uncontrolled, and the cameras are generally of low quality and usually distant from potentially important parts of the scene. The location and orientation of the faces in the scene can usually not be controlled. Some facial features, such as the hairline, eyebrows, and chin are easily altered. Other features, such as the mouth are highly variable, particularly in a video.
Face recognition compares an image of an unidentified face (a probe image) with a set of images of identified faces (gallery images). The gallery can include multiple images of the same face. The comparison permits two possible outcomes: the probe and gallery images are of the same face, or the probe and gallery images are of different faces.
Probabilistically, these two outcomes can be expressed as P(SAME|D) and P(DIFFERENT|D), where D represents the datum, a particular sample pair from the probe/gallery distribution. Using Bayes law, a conditional probability can be expressed as:
      P    (          SAME      ❘      D        )    =                              P          ⁡                      (                          D              ❘              SAME                        )                          ⁢                  P          ⁡                      (            SAME            )                                                            P            ⁡                          (                              D                ❘                SAME                            )                                ⁢                      P            ⁡                          (              SAME              )                                      +                              P            ⁡                          (                              D                ❘                DIFFERENT                            )                                ⁢                      P            ⁡                          (              DIFFERENT              )                                            .  
The conditional probability P(DIFFERENT|D) can be expressed similarly, or as=1−P(SAME|D), see Duda et al., “Pattern classification and scene analysis,” Wiley, New York, 1973.
Then, the quantities P(SAME|D) and P(DIFFERENT|D) can be compared to determine whether the probe image is the same as one of the gallery images, or not. To recognize from among a large number of faces, one maximizes P(SAME|D) over all the images.
Some face recognition systems are based on principal component analysis (PCA) or the Karhunen-Loeve expansion. U.S. Pat. No. 5,164,992, “Face Recognition System” issued to M. A. Turk et al. on Nov. 17, 1992 describes a system where a matrix of training vectors is extracted from images and reduced by PCA into a set of orthonormal eigenvectors and associated eigenvalues, which describes the distribution of the images. The vectors are projected onto a subspace. Faces are recognized by measuring the Euclidean distance between projected vectors. The problem with the PCA approach is that variations in the appearance of specific features, such as the mouth, cannot be modeled.
Costen et al. in “Automatic Face Recognition: What Representation?,” Technical Report of The Institute of Electronics, Information and Communication Engineers (IEICE), pages 95–32, January 1996, describe how the recognition accuracy can be raised by using the Mahalanobis distance. A modified Mahalanobis distance method is described by Kato et al. in “A Handwritten Character Recognition System Using Modified Mahalanobis distance,” Transaction of IEICE, Vol. J79-D-II, No. 1, pages 45–52, January 1996. They do this by adding a bias value to each eigenvalue.
Moghaddam et al. describe a probabilistic face recognition in U.S. Pat. No. 5,710,833, “Detection, recognition and coding of complex objects using probabilistic eigenspace analysis” issued to on Jan. 20, 1998, and Moghaddam et al., “Beyond eigenfaces: Probabilistic matching for face recognition” Proc. of Int'l Conf. on Automatic Face and Gesture Recognition, pages 30–35, April 1998.
They describe a system for recognizing instances of a selected object or object feature, e.g., faces, in a digitally represented scene. They subtract the probe image from each gallery image to obtain a difference image. The distribution of difference images, P(D|SAME) and P(D|DIFFERENT), are then modeled as Gaussian probability density functions.
The key weakness of that method is that the Gaussian models of difference images are very restrictive. In practice two images of the same face can vary with lighting and facial expression, e.g., frowning or smiling. To get useful difference images, the probe and gallery images must be very similar, e.g., a frontal probe image cannot be compared with a profile gallery image of the same face. In addition, their method does not accommodate motion of facial features, such as the mouth, and thus, is not well suited to being used on videos.
Another face recognition technique uses a deformable mapping. Each gallery image is pre-processed to map it to an elastic graph of nodes. Each node is at a given position on the face, e.g. the corners of the mouth, and is connected to nearby nodes. A set of local image measurements (Gabor filter responses) are made at each node, and the measurements are associated with each node. The probe and gallery images are compared by placing the elastic graph from each gallery image on the probe image.
However, facial features often move as a person smiles or frowns. Therefore, the best position for a node on the probe image is often different than on the gallery image. As an advantage, the elastic graph explicitly handles facial feature motion. However, it is assumed that the features have the same appearance in all images. The disadvantage of that approach is that there is no statistical model for allowed and disallowed variations for same versus different.
Viola and Jones, in “Rapid Object Detection using a Boosted Cascade of Simple Features,” Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, 2001, describe a new framework for detecting objects such as faces in images. They present three new insights: a set of image features which are both extremely efficient and effective for face detection, a feature selection process based on Adaboost, and a cascaded architecture for learning and detecting faces. Adaboost provides an effective learning algorithm and strong bounds on generalized performance, see Freund et al., “A decision-theoretic generalization of on-line learning and an application to boosting,” Computational Learning Theory, Eurocolt '95, pages 23–37. Springer-Verlag, 1995, Schapire et al., “Boosting the margin: A new explanation for the effectiveness of voting methods,” Proceedings of the Fourteenth International Conference on Machine Learning, 1997, Tieu et al., “Boosting image retrieval,” International Conference on Computer Vision, 2000. The Viola and Jones approach provides an extremely efficient technique for face detection but does not address the problem of face recognition, which is a more complex process.
Therefore, there is a need for a face recognition system that improves upon the prior art.