In recent years, rather than photograph prints obtained by silver salt photography, users have been viewing digital images with general use digital cameras, cameras installed on digital phones, and high function professional digital cameras, as well as scanned images from silver salt photography and the like. For digital images the level of viewer focus is high, and the standards for evaluation are exacting in the case of images that include people, and there has been an increase in the performance of correction processing for improving image quality for these images for the face region of people.
As a result, various determination technologies have been disclosed for a candidate region for the face of a person in a photographed image. All of these technologies utilize the fact that feature portions such eyes and mouth are included in face region of a person. For example, technology has been disclosed which uses the Kohonen self organization learning method, and multiple feature patterns for multiple feature regions of the face (for example the eyes and the mouth) are learned, and a determination is made as to whether the feature region of the face candidate is included in the multiple feature patterns that were learned. A determination is also made as to whether the face candidate is the face, by determining whether the positional relationship of the feature regions of face candidate is the same as the positional relationship of the feature regions of the face (for example, see Japanese Patent Application Laid-Open No. 5-282457 publication and Japanese Patent Application Laid-Open No. 6-214970 publication).
In addition, technology has been disclosed in which a face region is determined by determining a face candidate region which corresponds to the shape of the face of a person, and the using a prescribed threshold based on the features in the face candidate region (for example see Japanese Patent Application Laid-Open No. 8-63597 publication). The brightness histogram for the face candidate region has two peaks corresponding to the skin section that has a high brightness level and the eyes which have a low brightness level, and the number of pixels forming the peak at the low brightness level which corresponds to the number of pixels for the face candidate region, or in other words the frequency of the eye pixels, is considered the feature amount. The template matching method is used for extracting the eyes.
In addition, a method has been disclosed in which parts such as facial outline, hair or eyes and the like are extracted and then superimposed and used as the main candidate for photography (for example, see Japanese Patent Application Laid-Open No. 11-316845 publication). The image to be used is given a binary value, and the elliptical black region which has long axis/short axis in a prescribed range which is a feature of the eyes of an average person, is extracted as a region which can correspond to the eye region which is one of the internal structures of the face. Furthermore, the angle of the long axis for the extracted eye candidate region is determined, and the angle difference of the long axis determines the elliptical black region having the prescribed range, and this forms a pair with the eye candidate region that was first extracted and the eye candidate region is thereby extracted. In this manner, a parallel linear symmetry axis joining the centers of both eye-section candidate regions is set for each pair of the extracted eye candidate regions, and the likeness level of the linear symmetry is determined, and the black region which is estimated to be the eye in accordance with this is extracted.
In the technology described in Japanese Patent Application Laid-Open No. 5-282457 publication and No. 6-214970 publication, even the positional relationship between the feature regions are used to make the determination, and thus erroneous determination is possible if highly accurate extraction of the face candidate region is not done. For example, this is the case when multiple faces are detected as one face candidate in the typical user portrait having scenes with many faces of people close together. Also in natural images of a typical user, because there are many and various types of light sources and exposure states; expressions of persons being photographed; and photography states, processing load was large if highly accurate face extraction is to be achieved.
In the method which uses a histogram such as in Japanese Patent Application Laid-Open No. 8-63597 publication, structures other than the eyes may be detected as the eyes in error since aside from the eyes, there are other portions in the face section that have the same brightness or color value, such as the eyes such as the eyebrows or hair, the area of the face outline, and in scenes where there is a high contrast, the shadow of the nose and the like. Also in the case of group pictures that are taken in front of buildings, if the size of each persons being photographed is small, the number of pixels that comprise the mouth and eye regions is small, and extraction using the template matching method becomes difficult as a block is formed in place of structures such as the actual eyes and mouth and the like.
In the case where the size of the person who is being photographed is small and the number of pixels comprising the eye or mouth regions is small, accuracy is low for the extraction method which uses the angle of the long axis/short axis ratio of the eye portion candidate region which is described in Japanese Patent Application Laid-Open No. 11-316845 publication.