1. Field of the Invention
The present invention relates to an image processing apparatus and method, and storage medium.
2. Description of the Related Art
A technique for detecting an eye open/closed state that is one of human facial expressions has been developed. For example, a technique, disclosed in reference [1] binarizes an input image, extracts a black region corresponding to a pupil from the binarized image, and determines the eye open/closed state based on the number of vertical continuous pixels of the black region. In this technique, the maximum value of the number of vertical continuous pixels of the black (iris) region is referred to in a plurality of images. Then, a threshold value to be used to determine the presence/absence of a blink is set based on the maximum and minimum values of the number of vertical continuous pixels.
A technique, disclosed in reference [2] detects the edges of upper and lower eyelids and determines the eye open/closed state based on the distance between the upper and lower eyelid edges.
There have also been developed techniques for detecting facial expressions of emotions such as joy and anger. For example, a technique, disclosed in reference [3] executes two-dimensional Fourier transform for an input image and generates a predetermined feature vector. The probability of feature vector generation is calculated from the hidden Markov models of facial expressions prepared in advance. A facial expression corresponding to the hidden Markov model used to calculate the maximum probability is output as a recognition result.
On the other hand, a technique for adjusting a facial expression of an electronic secretary interacting with a user and the degree of the facial expression based on the interacting user, amount of interaction, and situation has also been developed. For example, a technique, disclosed in reference [4] reads out, based on an interacting user, amount of interaction, and situation, a specific expression threshold value from an expression threshold value set stored in advance, or sets an expression threshold value using a specific one of several transformations defined in advance. With this method, the facial expression style of an electronic secretary is set or changed.
However, the shapes and motions of parts such as eyes and mouth included in a face largely change between persons. For example, for a person whose upper and lower eyelids are spaced apart by a relatively long distance, the amount of change in distance between the upper and lower eyelids is large. However, the amount of change in distance between the upper and lower eyelids is small for a person having a short distance between the upper and lower eyelids.
In reference [8] that objectively describes the actions of facial expressions, “joy” that is one of facial expressions is described as (1) “raise cheeks”, (2) “pull up lip ends”, . . . . However, the amount of change of the cheek or lip end also greatly varies between persons.
For these reasons, if the same parameter (e.g., threshold value) is used for all persons in determining a facial expression, for example, a specific person whose upper and lower eyelids have a short distance may always be determined erroneously to be in an eye closed state. For example, a person who moves the parts such as the eyes and mouth only in a small amount may always be determined erroneously as expressionless.    reference [1] Japanese Patent Laid-Open No. 06-032154    reference [2] Japanese Patent Laid-Open No. 2000-137792    reference [3] Japanese Patent No. 2962549    reference [4] Japanese Patent Laid-Open No. 07-104778    reference [5] Japanese Patent Laid-Open No. 2000-030065    reference [6] Japanese Patent Laid-Open No. 2003-323622    reference [7] Japanese Patent Laid-Open No. 2005-056388    reference [8] P. Ekman and W. V. Friesen, Facial Action Coding System (FACS): Manual, Palo Alto: Consulting Psychologists Press, 1978    reference [9] P. Viola and M. Jones, “Rapid object detection using a Boosted Cascade of Simple Features”, Proc. of IEEE Conf. CVPR, 1, pp. 511-518, 2001    reference [10] Yann LeCun and Yoshua Bengio, “Convolutinal Networks for Images, Speech, and Time Series”, The Handbook of Brain Theory and Neural Networks, pp. 255-258, 1995    reference [11] Ishii, Ueda, Maeda, and Murase, “Easy-to-Understand Pattern Recognition”, Ohmsya, 1998