Digital images are being used in increasingly more applications. In many of those applications, automated analysis of digital images can be performed to provide either or both of face detection and face recognition. In face detection, an image region depicting a face is identified. In face recognition, a detected face is associated with a known individual. Face detection and face recognition can be used for a wide variety of tasks, including image enhancement, content-based retrieval, automatic identification, and image database management. For instance, in image processing applications, face detection can be used to automatically perform enhancements, such as red-eye correction and contrast adjustment. Further, face recognition can be used in conjunction with search applications retrieve images that depict a particular individual.
Numerous approaches to face detection have been developed, including neural networks, skin-tone techniques, eigenface/eigenvector, and clustering. A neural network can be trained to locate faces in digital images using one or more training sets that include face images and non-face images. The performance of a neural network will depend on its training. Skin-tone techniques utilize color-based analysis to identify regions in a digital image that primarily consist of colors associated with a predetermined skin-tone color space. Eigenface or eigenvector processing relies on determining a standardized face based on the statistical analysis of a large number of faces and analyzing digital images with respect to the eigenface. Clustering operates to identify clusters of face-data within which variance is low and non-face data regions that have high-variance from the face-data. These approaches require significant processing and thus typically do not provide real-time face detection.
Real-time face detection algorithms also have been developed, such as the Viola-Jones algorithm. The Viola-Jones algorithm searches pixel sub-windows of a candidate window using multiple classifiers, with each classifier configured to select particular visual features from a set of possible visual features. The features used by the Viola-Jones detection framework are characterized as rectangular areas. Different types of features can be defined, including both horizontal and vertical features. The value of a feature is computed by subtracting the sum of the image pixels in one or more feature regions from the sum of the image pixels in one or more other feature regions. The sum of the image pixels within a feature region (or contrast region) can be computed based on an integral image, which is computed from an original image being evaluated. Further, the classifiers are grouped in stages, which are cascaded. As a result, only sub-windows that pass the classifiers of the current stage are submitted for further analysis to the classifiers of a subsequent stage. Thus, at least some of the sub-windows that do not represent face data can be discarded early in the analysis.