1. Technical Field
This invention is directed toward a face detection system and process for detecting the presence of faces of people depicted in an input image, and more particularly to such a face detection system and process that can detect faces at various orientations in real-time.
2. Background Art
Face detection systems essentially operate by scanning an image for regions having attributes which would indicate that a region contains a person's face. These systems operate by comparing some type of training images depicting people's faces (or representations thereof) to an image or representation of a person's face extracted from an input image. Furthermore, face detection is the first step towards automated face recognition. It has remained a challenging problem especially for non-frontal view faces. This challenge is firstly due to the large amount of variation and complexity brought about by the changes in facial appearance, lighting and expression [2, 28]. Changes in facial view (head pose) further complicate the situation because the distribution of non-frontal faces in the image space is much more dispersed and more complicated than that of frontal faces. Learning based methods have so far been the most effective ones for face detection. Most face detection systems learn to classify between face and non-face by template matching. They treat face detection as an intrinsically two-dimensional (2-D) problem, taking advantage of the fact that faces are highly correlated. It is assumed that human faces can be described by some low-dimensional features which may be derived from a set of prototype or training face images. From a pattern recognition viewpoint, two issues are essential in face detection: (i) feature selection, and (ii) classifier design in view of the selected features.
A procedure developed by Freund and Shapire [8], referred to as AdaBoost, has been an effective learning method for many pattern classification problems, to include face detection. AdaBoost is a sequential forward search procedure using the greedy selection strategy. Its heuristic assumption is monotonicity, i.e. that when adding a new feature to the current set, the value of the performance criterion does not decrease. The premise offered by this sequential procedure can be broken-down when the assumption is violated, i.e. when the performance criterion function is non-monotonic. As a sequential search algorithm, AdaBoost can suffer from local optima when the evaluation criterion is non-monotonic.
Another issue is real-time multi-view face detection. Previous face detections systems, especially any that can detect faces in multiple viewpoints, are very slow, too slow to be used for real time applications. Most existing works in face detection, including Viola et al. [33], deal with frontal faces. Sung and Poggio [31] partition the frontal face and non-face image spaces each into several probability clusters, derive feature vectors in these subspaces, and then train neural networks to classify between face and nonface. Rowley et al [23] trained retinally connected neural networks using preprocessed image pixel values directly. Osuna et al [18] apply the support vector machines algorithm to train a neural network to classify face and non-face patterns. Roth et al. [22] use a SNoW learning architecture specifically tailored for learning in the presence of a very large number of features for the face and non-face classification.
In Viola et al.[33], simple Haar-like features, used earlier in Papageorgiou [19] for pedestrian detection, are extracted; face/non-face classification is done by using a cascade of successively more complex classifiers which are trained by using the (discrete) AdaBoost learning algorithm. This resulted in the first real-time frontal face detection system which runs at about 14 frame per second for a 320×240 image [33]. However, the ability to deal with non-frontal faces is important for many real applications because, for example, statistics show that approximately 75% of the faces in home photos are non-frontal [15]. A reasonable treatment for multi-view face detection is the view-based method taught by Pentland et al.[20], in which several face models are built, each describing faces in a certain view. This way, explicit 3D modeling is avoided. Feraud et al. [6] adopt the view-based representation for face detection, and use an array of five detectors with each detector responsible for one view. Wiskott et al. [34] build elastic bunch graph templates for multi-view face detection and recognition. Gong and colleagues [10] study the trajectories of faces in linear Principal Component Analysis (PCA) feature spaces as they rotate, and use kernel support vector machines (SVMs) for multi-pose face detection and pose estimation [17, 16]. Huang et al. [11] use SVMs to estimate facial poses.
The system of Schneiderman and Kanade [26] is claimed to be the first algorithm in the world for multi-view face detection. Their algorithm consists of an array of five face detectors in the view-based framework. Each is constructed using statistics of products of histograms computed from examples of the respective view. However, it is very slow and takes one minute to work on a 320×240 pixel image over only four octaves of candidate size [26].
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.