1. Field of the Invention
The present invention relates to a learning method of a face classification apparatus for classifying whether a digital image is a facial image. The present invention also relates to a face classification method and apparatus and a program for the face classification apparatus.
2. Description of the Related Art
Conventionally, when snapshots are taken with a digital camera, the skin colors of persons in photographs are corrected by checking color distribution in the facial regions of the persons in images. Further, when a digital video image is captured by a digital video camera in a monitor system, a person in the digital image is recognized. In these cases, it is necessary to detect a facial region, which corresponds to the face of a person, in the digital image. Therefore, various methods have been proposed to classify whether an image represents a face.
For example, in the method proposed in Henry A. Rowley et al., “Neural Network-Based Face Detection”, vol. 20, No. 1, pp. 23-38, January 1998, luminance values, which are used as feature values in face detection, are normalized. Then, judgment is made as to whether an image is a facial image with reference to a result of learning about faces, which is obtained by using a technique of neural network learning. Further, in the method proposed in Rainer Lienhart and Jochen Maydt, “An Extended Set of Haar-like Features for Rapid Object Detection”, IEEE ICIP 2002, Vol. 1, pp. 900-903, September 2002, high frequency components, such as edges, in an image are obtained as feature values for detecting an object. Then, the feature values are normalized. Further, judgment is made as to whether an image is an image representing the object with reference to a result of learning about the feature values, which is obtained by using a machine-learning method called “Boosting”. In both methods, the feature values which are used for detecting an object, such as a face, are normalized. Therefore, it is possible to accurately classify whether an image is an image representing the object.
Further, methods for classifying classification object images into images which represent a predetermined object and images which do not represent the predetermined object are also proposed. In the methods, a plurality of classifiers is obtained in advance by learning a feature value calculated in each of a multiplicity of sets of sample images by using a machine-learning method. Each of the multiplicity of sets of sample images includes a plurality of sample images which are recognized as images representing the predetermined object and a plurality of sample images which are recognized as images which do not represent the predetermined object. The plurality of classifiers outputs standard values for classifying the classification object images based on received feature values. If the weighted sum of the standard values output from the plurality of classifiers exceeds a predetermined threshold value, the classification object image is classified as an image representing the predetermined object (please refer to U.S. Patent Application Publication No. 20050100195).
Further, another method for classifying a classification object image as an image representing a face is proposed. In this method, a plurality of weak classifiers for classifying, based on a received feature value, whether the classification object image is an image representing a face is provided. The plurality of weak classifiers is obtained in advance by learning a feature value calculated in each of a multiplicity of sets of sample images by using a machine-learning method. Each of the multiplicity of sets of sample images includes a plurality of sample images which are recognized as images representing faces and a plurality of sample images which are recognized as images which do not represent faces. In this method, the plurality of weak classifiers is linearly combined to form a cascade structure. If a classification object image is classified as an image representing a face in each of all the weak classifiers, the classification object image is classified as an image representing a face (please refer to Shihong LAO, et al., “Fast Omni-Directional Face Detection”, Meeting of Image Recognition and Understanding (MIRU2004), pp. II271-II276, July 2004).
If facial images are used as the sample images and learning is performed by using the methods disclosed in U.S. Patent Application Publication No. 20050100195 and “Fast Omni-Directional Face Detection”, it is possible to efficiently classify whether the classification object image is an image representing a face.
Further, the sample images are transformed stepwise by enlarging/reducing or rotating them stepwise. Then, the sample images obtained in each step of transformation are used for learning. Therefore, it is possible to classify whether the classification object image is an image representing a face even if the face represented by the classification object image is reduced at various magnification factors or even if the face is slightly rotated.
Here, when a plurality of classifiers or a plurality of weak classifiers is obtained in advance by using the machine-learning method, facial images including faces which have the same angle of inclination and the same direction (with respect to the direction of headshake) are used (please refer to FIG. 7 in “Fast Omni-Directional Face Detection”). Since the facial images including faces which have the same angle of inclination and the same direction, as described above, are used as sample images, each facial part such as an eye or eyes, a nose or a mouth or a facial contour appears at a substantially same position in all of the sample images representing faces. Therefore, a characteristic feature which is common to the facial patterns can be easily detected and the accuracy in classification can be improved.
Further, when the facial images including faces which have the same angle of inclination and the same direction are used as the sample images for learning, as described above, the direction of the faces in the sample images for learning is the direction of faces which can be classified by the face classification apparatus which has learned by using the sample images. Therefore, when a user wishes to detect faces in various directions so as to detect a face in an arbitrary direction, a plurality of face classification means (apparatuses) for classifying whether a classification object image is an image representing a face by using a classification method is prepared. The plurality of face classification means is prepared for each direction of the faces. Further, the plurality of face classification means is used simultaneously.
Generally, a characteristic region of an image which should be learned by the face classification apparatus is not always the same. Particularly, the characteristic region is different according to the direction of a face to be classified by the face classification apparatus and a type of processing in face detection processing in which the face classification apparatus is used.
For example, if the face classification apparatus is an apparatus for classifying profile faces (side-view faces), it is important to learn a characteristic feature of the profile faces that a background region is relatively large. Alternatively, if the face classification apparatus is an apparatus for detecting frontal faces and if the face classification apparatus is used for roughly detecting a face candidate in face detection processing, it is important to emphasize the robustness of the apparatus. Therefore, it is important to cause the face classification apparatus to learn a simplest common characteristic feature of frontal faces rather than a subtle characteristic feature of each facial part. The simplest common characteristic feature is that the shape of a face is round.
However, when facial images which have the same angle of inclination and the same direction are used as sample images for learning, as described above, since only the angles of inclination and the directions of the images are the same, characteristic regions of the images, which should be learned, are not always included in the sample images in an appropriate manner. Further, a different characteristic feature of an image tends to be included in each of the plurality of sample images. Therefore, it is difficult to cause the face classification apparatus to accurately learn a characteristic feature which should be primarily learned.