1. Field of the Invention
The present invention relates to image processing apparatuses and image processing methods for detecting and recognizing a face image included in a captured image, and, more particularly, to an image processing apparatus and an image processing method for detecting the positions of parts of a face such as the centers, inner corners, and outer corners of eyes, a nose, the lower end and sides of the nose, a mouth, the ends of the mouth, eyebrows, and the inner corners and outer corners of the eyebrows from a face image detected from an input image.
More specifically, the present invention relates to image processing apparatuses and image processing methods for detecting the positions of parts of a face from a face image using a detector employing a statistical learning algorithm such as Adaboost, and, more particularly, to an image processing apparatus and an image processing method for detecting the positions of parts of a face such as eyes from a face image detected by face detection with smaller amounts of computation and memory.
2. Description of the Related Art
Face recognition techniques can be widely applied to man-machine interfaces such as a personal identification system that does not impose a burden on users and a sex determination system. In recent years, face recognition techniques have been used for automization of camera work for subject detection or subject recognition in a digital camera, for example, automatic focusing (AF), autoexposure (AE), automatic setting of the angle of view, or automatic photographing.
In a face recognition system, for example, face detection for detecting the position of a face image and extracting the detected face image as a detected face, face part detection for detecting main parts of a face from the detected face, and face recognition for recognizing the detected face (specifying a person) are performed. In the face detection, the size and position of a face image are detected from an input image and the detected face image is extracted as a detected face. In the face part detection, face parts are detected from the detected face. The face parts include the centers, inner corners, and outer corners of eyes, a nose, the lower end and sides of the nose, a mouth, the ends of the mouth, eyebrows, and the inner corners and outer corners of the eyebrows. After position adjustment and rotation compensation have been performed on the basis of the detected positions of the detected face parts, in the face recognition, recognition of the detected face (specification of a person) is performed.
Many methods of detecting a face from a complex image scene using only a density pattern of an image signal have been proposed. For example, a detector employing a statistics learning algorithm such as Adaboost can be used for the above-described face detection.
Adaboost was proposed by Freund et al. in 1996 as a theory in which a “strong classifier” can be obtained by combining many “weak classifiers (also called Weak Learners) that perform slightly better than random guessing”. Each weak classifier may be a filter such as the Haar basis function, and is generated in such a manner that a weight a is assigned to the result of classification that a previously generated weak classifier is not good at. The reliability of each weak classifier is obtained, and, on the basis of the obtained reliability of each weak classifier, a majority vote is performed.
Here, it can be assumed that various sized faces are included in an input image (see, FIG. 9). Accordingly, it is necessary to cut out various sized search windows so as to determine whether a face is included in each of the cutout search windows.
As a method of handling the problem of the relationship between the resolution of an image and the size of a detected face, a method of fixing the resolution of an image (that is, a method of preparing various face detectors for various sized faces included in an input image) and a method of fixing the size of a detected face (that is, a method of variously reducing the resolution of an input image for detection using a single face detector having a fixed detectable face size). The latter is more realistic than the former. A window (hereinafter also referred to as “search window”) of the same size as a learning sample is usually cut out from each image obtained by converting the scale of an input image so as to search for different sized search windows. That is, since the size of a face included in the input image cannot be determined, it is necessary to cause a face detector to scan the input image each time the resolution of the input image is changed. From each image obtained by changing the resolution of the input image, only a face of a size near the fixed detectable face size of the face detector can be detected (see, FIG. 10).
However, for example, an input image composed of 320×240 pixels includes search windows of approximately 5,000 sizes, and weak discriminators take a long time to perform arithmetic operations on all the window sizes. Accordingly, some methods of enhancing the speed of the arithmetic operation of the weak discriminator have been proposed.
For example, a method of rapidly calculating a weak hypothesis using rectangle features and images called integral images is known (see, for example, United States Unexamined Patent Application Publication No. 2002/0102024 and Paul Viola, Rapid Object Detection using a Boosted Cascade of Simple Features (CVPR 2001)).
Furthermore, an object detection device is disclosed in which, at the time of the majority vote, a window image is determined to be a non-object using obtained calculation results even in the course of calculation without waiting until all weak discriminators individually output calculation results and then further calculation is canceled. In such an object detection device, a threshold value used to cancel calculation is learnt in a learning session (see, for example, Japanese Unexamined Patent Application Publication No. 2005-157679). As a result, the amount of computation in the process of detecting an object can be markedly reduced.