1. Field of the Invention
The present invention relates to an image processing apparatus, image processing method, program, and storage medium, which discriminate similar images.
2. Description of the Related Art
In recent years, the development of personal authentication technologies based on physical features such as fingerprints, palm prints, veins, and irises, so-called biometrics authentication technologies has been made. Such biometrics authentication technologies include many technologies using, as objects to be processed, images acquired by photoelectric conversion devices such as digital cameras, and data obtained by converting images into two-dimensional spatial data corresponding to them.
Of these technologies, a face recognition technology using face images has particularly received a lot of attention due to disinclination less than other biometrics authentication technologies using, for example, fingerprints, since it is equivalent to a behavior made when one human identifies another.
One of the problems posed when executing personal authentication using images such as faces lies in the fact that patterns to be discriminated are relatively similar to each other. In case of “face detection” for detecting human faces from an arbitrary input natural image, differences between image patterns such as faces are very small compared to that from an image pattern as a background even if images are sensed under various image sensing conditions or include various persons.
That is, the face detection is considered as relatively easy pattern recognition, since it need only separate “similar” patterns having small differences as a face class from other patterns. A pattern recognition technology used in such face detection is disclosed in, for example, Japanese Patent Laid-Open No. 2002-358500, and P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features” (Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518, December 2001.).
On the other hand, “face recognition” requires processing for discriminating individual classes by finding differences from similar patterns detected as a human face class. Then, differences caused by image sensing conditions, facial expressions, views of faces, accessories such as eyeglasses and makeup, and so forth of a single person often become larger than those between detected faces of different individuals.
That is, it is inherently a very difficult problem to extract only individual differences and to classify them into individual classes while ignoring differences caused by image sensing conditions and the like.
As a related art to solve this problem, a method of focusing on local regions of face images has been proposed. As for a plurality of face images obtained by sensing images of a certain individual, even if there are differences caused by image sensing conditions and the like, these influences do not uniformly appear on the entire face.
For example, even when a facial expression has changed, a difference near a nose is smaller than an image before alteration. Even when a face is strongly illuminated from an oblique direction, a difference in an illuminated part is smaller than a shadow part. Even in case of a left-view face with respect to an observer, a difference on the right side part is smaller than that on the left side part compared to a front-view image due to the three-dimensional shape of the face.
Therefore, even when a difference on a certain local region is large, it is expected that only differences which allow to identify an individual are generated on some other local regions. That is, by selectively using discrimination results based on local regions where only differences which allow to identify an individual are generated, satisfactory personal authentication can be implemented.
In order to determine the positions of such local regions, for example, an image recognition apparatus disclosed in Japanese Patent Laid-Open No. 2005-346654 adopts a positioning method based on a “standard face”. The “standard face” is a face image generated by averaging a large number of normalized sample face images in association with respective pixels.
Then, a large number of feature points are set on this standard face image, as shown in, for example, FIG. 19. In case of Japanese Patent Laid-Open No. 2005-346654, neighboring regions including these feature points are used as local regions. When a normalized face image to be registered or discriminated is input, pattern matching is executed between feature points on the standard face to determine the positions of feature points on the input face image. Note that these feature points are selected in advance by machine learning.
Also, Yoshihisa Ijiri et al., “face recognition based on local region extraction according to face views” (Proceedings of 13th Symposium on Sensing via Imaging Information, Yokohama, Jun, 2007) (to be referred to as reference 1 hereinafter) discloses a face recognition method that sets local regions with reference to detected feature points. Note that each feature point serving as a reference point adopts one point such as the left end (outer corner) of the left eye, which can be relatively easily detected.
Then, the position of a local region is defined by predetermined shift amounts (a, b) in the abscissa (x-direction) and ordinate (y-direction) from the detected reference point. At this time, in order to always set the local region at nearly equal positions on an actual face, it is effective to change the shift amounts depending on face images. Also, in order to clip the local region to have a nearly equal range on an actual face, a range c to be clipped is preferably changed depending on face views.
Hence, in reference 1, face direction estimation is executed using position information of a plurality of detected feature points, and the position and range of the local region are changed according to the estimated face direction. For example, in case of a frontal face, as shown in 20a of FIG. 20, a local region is clipped using parameters a1, b1, and c1. On the other hand, in case of a left-view face, as shown in 20b of FIG. 20, a local region is clipped using parameters a2, b2, and c2.
Japanese Patent Laid-Open No. 2004-265267 discloses a method of setting, using some directly detected feature points, other feature points. Note that in case of Japanese Patent Laid-Open No. 2004-265267, one local region is set for one feature point irrespective of directly detected feature points.
FIG. 21 is a view for explaining an example of a feature point setting method disclosed in Japanese Patent Laid-Open No. 2004-265267. As shown in 21a of FIG. 21, in case of Japanese Patent Laid-Open No. 2004-265267, only three points (A, B, C) of two inner corners of eyes and a nose are detected as feature points. Then, intersections obtained when meshes are formed using straight lines by translating those which match the respective sides of a triangle having these three points as vertices, as shown in 21a of FIG. 21, are defined as new feature points. Also, calculations are made using integer multiples of three vectors.
Furthermore, Japanese Patent Laid-Open No. 2004-265267 also discloses a method of defining, as a new feature point, a position obtained by rotating, for example, a vector CA through a predetermined angle about C. According to Japanese Patent Laid-Open No. 2004-265267, feature points can be defined at all identical positions on face images of a certain specific person by these methods. However, in practice, such characteristic feature is obtained only when variations of face images are only rotation or enlargement/reduction within an image plane.
However, the positioning methods of local regions in the above related arts suffer the following problems.
That is, in case of the method disclosed in Japanese Patent Laid-Open No. 2005-346654, since pattern matching has to be done for each of a large number of feature points, calculation cost required to determine the positions of local regions increases.
In addition, since the detection precision of feature points at positions which hardly form a specific pattern like a cheek region becomes considerably low, the number of local regions that can be used in face recognition may often be extremely small depending on image sensing conditions of input images.
In case of the method disclosed in reference 1, the setting precision of a local region changes depending on the precision of the face-direction estimation executed as pre-processing. Also, the face-direction estimation based on limited feature points is not always easy, and high calculation cost is required accordingly.
Furthermore, by changing the range to be clipped for respective local regions, an effect of setting constant ranges on an actual face to some extent irrespective of face views can be expected, while the processing load per local region becomes nonnegligible if a large number of local regions are set.
If there are face-view variations in the depth direction due to the three-dimensional structure of a face, since the shape of an identical region does not become a similar shape but it is deformed, fitting using a sole parameter has limitations.
Furthermore, in case of the method disclosed in Japanese Patent Laid-Open No. 2004-265267, new feature points calculated from detected feature points are limited to positions obtained by combining integer-multiple positions of vectors that couple respective points. Hence, intermediate positions cannot be set.
As described above, if image variations are limited to rotations (in-plane rotations) within an image plane, feature points of identical positions are more likely to be set for an identical person. However, since an image input prior to personal authentication is, in general, normalized in advance, an in-plane rotation variation is nearly corrected, and rarely poses a problem. But, variations due to rotations in the depth direction pose a problem.
For example, in a left-view face in the depth direction, as shown in 21b of FIG. 21, the triangle defined by three points deforms, and newly calculated feature points are not set at the same positions as those before rotation even for an identical person. The same problem applies to another method (the method of setting a position by rotating a line segment through a predetermined angle) disclosed in Japanese Patent Laid-Open No. 2004-265267.