1. Field of the Invention
The present invention relates to an image processing apparatus and image processing method, which execute recognition processing of an object in image data, and to a program stored in a computer-readable storage medium for making a computer execute the image processing method.
2. Description of the Related Art
A technique which detects the presence/absence of a face of a person from an image, and recognizes a facial expression of that person by detecting the features of the face of the person is known.
For example, a method which extracts a part corresponding to a predetermined region where a facial expression of a person is readily shown from an image, computes the wavelet transforms of the extracted part to calculate an average power for respective frequency bands, and detects the facial expression based on a difference from an average power obtained from an expressionless face is known (for example, see patent reference 1). Also, a method which detects variations of predetermined features required to recognize a facial expression based on differences between feature amounts of an expressionless face prepared in advance and a face to be recognized, and calculates scores for respective facial expressions from the variations of the predetermined features so as to recognize a facial expression is known (for example, see patent reference 2).
However, with the aforementioned techniques, features used in recognition processing cannot often be accurately detected due to influences of some kind of shadows, accessories, and the like.
Hence, as techniques that can implement recognition processing even when these influences occur, the following techniques have been proposed.
A technique of patent reference 3 below discloses the following method. That is, the entire input facial image is divided into blocks, distances between feature vectors obtained from these blocks and average feature vectors of corresponding blocks obtained from a registered image group prepared in advance are calculated, and the feature vectors are weighted for respective blocks. After that, verification scores are calculated based on the weighted feature vectors, thus executing verification processing. Furthermore, a technique of patent reference 4 below discloses a method which detects a face part with high precision by processing an image from which reflected images of objects having shining reflecting surfaces such as spectacles are removed in a face part detection technique.
[Patent Reference 1] Japanese Patent No. 2840816
[Patent Reference 2] Japanese Patent Laid-Open No. 2005-56388
[Patent Reference 3] Japanese Patent Laid-Open No. 2003-323622
[Patent Reference 4] Japanese Patent Laid-Open No. 2002-352229
[Patent Reference 5] Japanese Patent Laid-Open No. 2000-30065
[Non-parent Reference 1] Edgar Osuna, Robert Freund, Federico Girosi “Training Support Vector Machines: an Application to Face Detection” Proceedings of CVPR '97, pp. 130-136, 1997
[Non-patent Reference 2] Yann LeCun and Yoshua Bengio “Convolutional Networks for Images, Speech, and Time Series” The Handbook of Brain Theory and Neural Networks, pp. 255-258, 1995
[Non-patent Reference 3] Watanabe, S. and Pakvasa, N. (1973). Subspace method of pattern recognition, Proceedings of 1st International Joint Conference of Pattern Recognition, pp. 25-32
A facial expression of a person can be expressed by a combination of motions of parts such as eyebrows, eyes, a mouth, cheeks, and the like. Upon recognizing a facial expression, only regions where expressive motions readily appear are set and analysis is made in these set regions in place of merely dividing a facial region into some regions and executing analysis in all the regions, thus leading to a reduction of processing cost.
Respective regions have different importance levels depending on facial expressions (for example, in a smile, a region around a mouth is apt to relatively largely change, but regions around eyes do not change so largely). Hence, it is desirable to weight respective parts such as eyes, a mouth, and the like, which form a face and where expressive motions readily appear or respective regions upon dividing a facial region into a plurality of regions in accordance with a facial expression to be recognized.
Upon verifying a person, a face is divided into a plurality of regions, only regions important for personal verification are set, and only the set regions need only undergo analysis. In this case, features obtained from these set regions have different importance levels. For example, features extracted from regions such as cheek regions including no parts often have lower importance levels than those extracted from regions near eyes and a mouth as features required to verify a person.
As a method of setting regions required to recognize a facial expression as described above, for example, a method of extracting the positions of eyes and a mouth by some method, and setting regions using these positions is available.
However, when the positions of eyes and a mouth cannot be detected due to occlusions by something such as sunglasses, mustache, shadows, and the like, regions where expressive motions readily appear cannot be set, either. In this case, all predetermined features cannot be detected, and a facial expression cannot be recognized. Even when some image correction is applied, the influences of sunglasses, mustache, shadows, and the like cannot be perfectly removed.
On the other hand, in case of personal verification as well, predetermined regions required to make personal verification cannot be set due to the influences of sunglasses, mustache, shadows, and the like, and predetermined features cannot be extracted. Hence, a person cannot often be verified in such case.