1. Field of the Invention
The present invention relates to a subject discriminating apparatus and a subject discriminating method for discriminating whether a predetermined subject, such as a face, is included in an image. The present invention also relates to a program that causes a computer to execute the subject discriminating method.
2. Description of the Related Art
Image data obtained by digital cameras and image data obtained by reading out images recorded on film are reproduced as hard copies, such as prints, or as soft copies, on displays. It is often the case that images represented by the image data include faces of people. Therefore, image processes that correct brightness gradation, color, sharpness, and the like, are administered on the image data so that the faces are of appropriate brightness and color. In the case that image processes are administered on image data in this manner, it is necessary to extract face regions that correspond to people's faces from the images represented by the image data. Accordingly, various methods have been proposed, for judging whether a predetermined subject, such as a face, is included in an image. In addition, various methods have also been proposed, for detecting the positions of structural components of a face, such as eyes. The detection of structural components enables accurate trimming of faces which have been judged to be included in images.
Henry A. Rowley, Shumeet Baluja, and Takeo Kanada, “Neural Network-Based Face Detection”, Computer Vision and Pattern Recognition, 1996 (hereinafter referred to as Document 1) discloses a method for discriminating whether an image includes a face. In this method, brightness values, which are characteristic amounts employed in detecting faces, are normalized. Then, whether a face is included in an image is judged by referring to learning results of a neural network, which has performed learning regarding faces, with the normalized brightness values. Note that in the method disclosed in Document 1, samples used for learning by the neural network are given an allowable range, to facilitate detection of faces within images. Specifically, a plurality of samples are prepared, in which the sizes of faces are varied, faces are rotated, and the like. Rainer Lienhart and Jochen Maydt, “An Extended Set of Haar-like Features for Rapid Object Detection”, International Conference on Image Processing (hereinafter referred to as Document 2) discloses another method of discriminating whether an image includes a face. In this method, high frequency components included in an image, which represent edges and the like, are obtained and normalized as characteristic amounts which are employed to detect a subject. Then, whether a face is included in an image is judged by referring to results of learning regarding characteristic amounts employing a machine learning method called “boosting” with the normalized high frequency components. The methods disclosed in Documents 1 and 2 are capable of accurately discriminating whether an image includes a subject, because the characteristic amounts employed in the detection of the subject, such as a face, are normalized.
H. Ohara et al., “Detection of Malignant Tumors in DR Images-Iris Filter”, Journal of the Electronic Information Communications Association, D-II Vol. J75-D-II No. 3, pp 663-670, March 1992 (hereinafter referred to as Document 3) discloses a method of extracting candidates of tumor patterns. In this method, tumor patterns, which are a characteristic of breast cancer, are detected based on the facts that tumor patterns have slightly lower density values than their surroundings on an X-ray negative film, and that gradient vectors of arbitrary pixels within tumor patterns point toward the center of the tumor patterns. Specifically, distributions of directions of gradient vectors within an image are evaluated, and regions, in which the gradient vectors are concentrated at a specific point, are judged to be tumor patterns and extracted. Further, U.S. Pat. No. 5,604,820 discloses a method of judging whether a candidate of a subject is the subject. This method employs Kohonen's self organization, which is a neural network technique, to learn characteristic patterns of subjects, such as faces. The results of learning are referred to with characteristic portions of the candidate of the subject, and judgment is made regarding whether the characteristic portions of the candidate of the subject are included in the learned characteristic patterns. Further, judgment is made regarding whether positional relationships of characteristic portions within the candidate of the subject match those of characteristic portions within the subject, thereby judging whether the candidate is the subject.
Ashish Kapoor and Rosalind W. Picard, “Real-Time, Fully Automatic Upper Facial Feature Tracking”, The 5th International Conference on Automatic Face and Gesture Recognition, May 2002 (hereinafter referred to as Document 4) discloses a method of detecting eyes from an image. In this method, a face is illuminated with infrared light, and photographed with an infrared camera, to obtain an image in which eyes are easily detectable. Alper Yilmaz and Mubarak A. Shah, “Automatic Feature Detection and Pose Recovery for Faces”, The 5th Asian Conference on Computer Vision, January 2002 (hereinafter referred to as Document 5) discloses a method for detecting eyes and eyebrows. In this method, color data of eyes and eyebrows, that constitute faces, are employed to detect the eyes and eyebrows within images. Ying-li Tian, T. Kanade and J. F. Cohn, “Dual-State Parametric Eye Tracking”, The 4th IEEE International Conference on Automatic Fac and Gesture Recognition, 2000 (hereinafter referred to as Document 6) discloses a method that judges whether eyes within an image are open or closed. In this method, templates of eyes are employed to detect the positions thereof, and whether the eyes are open or closed is judged by detection of pupils.
In the methods disclosed in Documents 1 and 2, the characteristic amounts, which are utilized to detect subjects, are normalized, thereby increasing the amount of calculations. Therefore, a problem arises that the processing time required for discrimination becomes long. The method disclosed in Document 3 only evaluates the distributions of the directions of gradient vectors. Therefore, subjects having simple shapes, such as tumor patterns, are detectable. However, subjects having complex shapes, such as human faces, cannot be detected by this method. The method disclosed in U.S. Pat. No. 5,604,820 performs judgment on a plurality of variables, and consequently, a long amount of time is required for processing.
The method disclosed in Document 4 is only capable of detecting eyes from images obtained by photography using infrared illumination and an infrared camera. Therefore, this method lacks versatility. The method disclosed in Document 5 employs color data. Therefore, this method is not applicable to cases in which skin color is different among people of different races. The methods disclosed in Documents 4 through 6 are also incapable of detecting eyes, unless the eyes are clearly pictured in an image. Therefore, eyes cannot be accurately detected in images, in which bangs cover the eyes. In addition, the method disclosed in U.S. Pat. No. 5,604,820 is incapable of accurately detecting the positions of structural components that constitute a face, such as eyes.