The detection of human faces in natural images and videos is a key component of a variety of applications in human-computer interaction, search and indexing, security and surveillance. As a result, face detection approaches—and in particular learning-based approaches to face detection abound—including real-time methods such as those described by P. Viola and M. Jones, in a paper entitled “Rapid Object Detection Using a Boosted Cascade of Simple Features”, which appeared in Proceedings, IEEE Conf. on Computer Vision and Pattern Recognition, pp. 511-518 in 2001. Approaches based on convolutional networks have also been explored and described in various publications, including a paper by R. Vaillant, C. Monrocq and Y. LeCun entitled “Original Approach For the Localisation of Objects in Images”, that appeared in IEEE Proc. on Vision, Image, and Signal Processing, vol 141(4): pp. 245-250 in August 1994 and one by C. Garcia and M. Delakis entitled “A Naural Architecture for Fast and Robust Face Detection”, which appeared in IEEE-IAPR Int. Conference on Pattern Recognition, pp. 40-43, in 2002.
An alternative approach—a view-based approach—involves building separate detectors for different views and either: 1) applying them in parallel (see., e.g., A. Pentland, B. Moghaddam, and T. Starner, “View-Based and Modular Eigenspaces for Face Recognition”, CVPR, 1994; K. Sung and T. Poggio, “Example-Based Learning of View-Based Human Face Detection”, PAMI, Vol. 20, pp. 39-51, 1998; H. Schneidermn and T. Kanade, “A Statistical Method for 3D Object Detection Applied to Faces and Cars”, Computer Vision and Pattern Recognition, 2000; and S. Z. Li, L. Zhu, Z. Zhang, A. Blake, H. Zhang, and H. Shum, “Statistical Learning of Multi-View Face Detection”, Proceedings of the 7th European Conference on Computer Vision—Part IV”, 2002); or 2) using a pose estimator to select a detector as was described in “Fast Multi-View Face Detection”, written by M. Jones and P. Viola, in a Technical Report R2003-96, Mitsubishi Electric Research Laboratories, 2003.
Yet another approach to human face detection—described by H. A. Rowley, S. Baluja, and T. Kanade in a paper entitled “Rotation Invarient Neural Network-Based Face Detection”, that appeared in Computer Vision and Pattern Recognition”, in 2000—estimates and corrects in-plane rotations before applying a single pose-specific detector.
Finally, in still another approach, a number of Support Vector Regressors are trained to approximate smooth functions, each of which has a maximum for a face at a particular pose. Such an approach was disclosed in a paper entitled “Support Vector Regresion and Classification Based Multi-View Face Detection and Recognition”, authored by Y. Li, S. Gong and H. Liddel and published in Face and Gesture, in 2000. This approach requires another machine trained to convert resulting values to estimates of poses and a third machine trained to convert the values into a face/non-face score. As can be appreciated such approaches are very slow.
Given the limited success experienced by prior-art approaches, new systems and methods that facilitate the real-time, simultaneous, multi-view face detection and facial pose estimation would represent a great technological step forward. Such a system and method are the subject of the present invention.