Automatic human face detection has a variety of useful applications such as in security systems, face detection systems, photo editing systems, and so on. Face detection is a challenging task because of the variances in images of background, view, illumination, articulation, and facial expression. Although face detection has recently become practical for certain limited applications, it generally is not practical for detecting the faces in real-life home photographs, in part because it is generally impossible to distinguish a face from background clutter in home photographs. Although frontal face detectors have met with some success, a significant percentage of faces in home photographs are non-frontal.
Many non-frontal face detectors use a view-based method, in which several face models are built, each describing faces in a given range of view. A “view” refers to the angle of rotation of a face. Out-of-plane rotation refers to the angle as the face looks to the left or right (e.g., a profile view), and in-plane rotation refers to the angle as the head tilts to the left or right. Multiple face models are used to avoid explicit three-dimensional modeling. In one conventional system, the views of a face are partitioned into five channels, and a multi-view detector is developed by training separate detectors for each view. There have also been studies of trajectories of faces in linear principal component analysis (“PCA”) feature spaces as they rotate, and support vector machines (“SVMs”) have been used for multi-view face detection and pose estimation.
Other conventional face detection systems have used multi-resolution information in different levels of a wavelet transform using an array of face detectors that are implemented in a view-based framework. For example, each detector can be constructed using statistics of products of histograms computed from examples of the respective view. Although this type of system may achieve acceptable detection accuracy in some applications, it is often very slow due to computational complexity.
To address the problem of slow detection speed, a coarse-to-fine, simple-to-complex pyramid approach combines the ideas of a boosting cascade and view-based methods. Although this approach improves the detection speed, it still has several problems. For example, since each boosting classifier works separately, the useful information between adjacent layers is discarded, which hampers the convergence of the training procedure. Furthermore, during the training process, this approach requires the classifiers in the earlier stage of the detector pyramid to cover a wide range of pose variations. Such a requirement increases the complexity of the learning process and results in low detection rate.