1. Field of the Invention
The present invention relates to a technique for detecting an area of a human face and information on the face from an image, and pertains to a moving picture retrieval apparatus for retrieving a moving picture using the information on the human face and on the person as a key, to a monitoring system for monitoring a driver and passengers on a car, and to a face identifying system for identifying a face shot in a camera as a face on a database.
2. Description of the Related Art
A technique for detecting a human face has been developed conventionally, and for example, there is a face detecting apparatus disclosed in Japanese Laid Open Patent Publication HEI7-311833. A conventional face detecting apparatus will be explained below using FIG. 22.
Conventional face detecting apparatus 2220, where the attention is drawn to eyes and mouth, is comprised of three processing apparatuses, namely, area detecting apparatus 2221 that detects a luminance minimum point where the luminance becomes the lowest locally and a luminance changing point where the luminance increases and that fetches an area between the two points as an area of a structural element of the face, face candidate detecting apparatus 2222 that detects a face candidate from the size and positional relationship of the face structural elemental area, and face determining apparatus 2223 that examines the face candidate in detail to determine whether the face candidate is of the face.
First differentiation section 2201 calculates the first differentiation of an input image signal 2231 downwardly starting from an upper portion of the image to output a first differentiation signal 2232. Binary section 2202 performs the binary processing on the first differentiation signal 2232 with 0 to output a first differentiation binary signal 2233. Second differentiation section 2203 calculates the second differentiation of the input image signal 2231 to output a second differentiation signal 2234. Binary section 2204 performs the binary processing on the second differentiation signal 2234 to output a second differentiation binary signal 2235.
OR section 2205 calculates the OR of the first differentiation binary signal 2233 and the second differentiation binary signal 2235 to output to an eye-mouth first candidate signal 2236. Connected area feature vector calculating section 2206 receives as its inputs the eye-mouth first candidate signal 2236 and the input image signal 2231, and with respect to the eye-mouth first candidate signal 2236, detects the area value, centroid position, vertical and horizontal lengths, and area feature vectors such as the luminance average and variance of each area composing the connected areas to output as an area feature vector signal 2237.
Eye second candidate determining section 2207 receives the area feature vector signal 2237, examines the area value, vertical and horizontal lengths and the luminance average and variance of each area, and thereby determines an area likely of an eye in the areas to output as an eye second candidate signal 2238 including the feature vectors of the area. Similarly, mouth second candidate determining section 2208 receives the area feature vector signal 2237, examines the area value, vertical and horizontal lengths and the luminance average and variance of each area, and thereby determines an area likely of a mouth in the areas to output as a mouth second candidate signal 2239 including the feature vectors of the area.
Face candidate determining section 2209 selects two eye candidate areas from the eye second candidate signal and one mouth candidate area from the mouth second candidate signal so that all the areas do not overlap each other, examines the centroid position of each area, further examines all combinations of candidate groups with an arrangement likely of a face, and thereby outputs a face candidate signal 2240.
Face candidate area image fetching section 2210 fetches a candidate area at which a face exists to output as a face candidate image signal 2241, based on centroid positions of candidate areas for right and left eyes in the corresponding face candidate signal, using the Affine transformation. Face determining section 2211 calculates a distance between the face candidate image signal and a face standard pattern, and when the distance is less than a predetermined threshold, determines that a human face is shot in a place corresponding to the input signal, and outputs a position, size and angle where the face exists as a face signal 2242.
As described above, in the conventional technique, the processing is executed that detects a luminance minimum point where the luminance becomes the lowest locally and a luminance changing point where the luminance increases, fetches an area between the two points as a candidate area for an eye or mouth, detects eye candidates and mouth candidates from shape characteristics and luminance characteristics of those candidate areas, detects face candidates from the positional relationship between the eye candidates and mouth candidates, and that examines each face candidate in detail to determine whether or not the face candidate is of the face.
However, in the case of an image/picture with many backgrounds shot therein as well as a face, many luminance minimum points and many luminance changing points exist therein, and therefore the detection of many eye and mouth candidate areas provides a problem that a lot of incorrect detection occurs.
In the conventional technique, the eye candidates and mouth candidates are detected from the shape characteristics and luminance characteristics of the eye and mouth candidate areas. However, the shapes of the eyes and mouth largely change corresponding to personal differences and changes in expression, and therefore many background portions incorrectly detected as eye and/or mouth candidate areas remain as the eye and/or mouth candidates. Further, when face candidates are detected using the positional relationship between the eye candidates and mouth candidates, many background portions incorrectly detected remain as face candidates, which is remarkable in designing to detect faces in profile and on the tilt also. Percentages by which the incorrect detection is suppressed in the processing for examining face candidates in detail change depending on the algorithm and threshold of the processing for examining face candidates in detail. An algorithm is preferable that provides less background portions incorrectly detected as face candidates and a less calculation amount. The calculation amount rapidly increases in many algorithms.
Further, in the conventional technique, detecting a face where a mustache covers part of a mouth provides a problem that the mustache and mouth are incapable of being separated and thereby are not detected.