Conventionally, there is known a technique of detecting faces from images including still images and moving images (non-patent reference 1: Yusuke Mitarai, Katsuhiko Mori, and Masakazu Matsugu, “Robust Face Detection System Based on Convolutional Neural Networks Using Selective Activation of Modules”, FIT (Forum of Information Technology), L1-013, 2003). A technique of determining the facial expression of the detected face is also known (Japanese Patent Laid-Open No. 2005-056388).
In association with this technique, Japanese Patent Laid-Open No. 11-232456 refers to a technique of discriminating each facial expression from a moving image including a plurality of facial expressions. Japanese Patent Laid-Open No. 10-228295 also refers to a technique of determining a facial expression by weighting sounds in association with fear and sadness and by weighting images in associating with joy and surprise. Japanese Patent Laid-Open No. 10-91808 refers to a technique of creating a facial expression by synthesizing a facial expression in accordance with the ratios between expressionlessness and other facial expressions. Japanese Patent Laid-Open No. 2005-199403 also refers to a technique of estimating the emotion of a person by weighting outputs from various types of sensors such as a camera and a microphone.
Japanese Patent Laid-Open No. 2004-46591 refers to a technique of calculating, for example, the degree of smile and the degree of decency, and displaying an evaluation on the degree of smile in preference to an evaluation on the degree of decency when an image is captured in a casual scene. In addition, Japanese Patent Laid-Open No. 2006-289508 refers to a technique of creating facial expressions upon providing facial expressions with high and low priorities.
Although various types of techniques of recognizing the facial expressions of persons have been proposed, there are still problems unsolved. For example, even different facial expressions have parts with similar shapes, for example, eyes and mouths, and hence it is impossible to properly recognize facial expressions, resulting in a recognition error. Such a recognition error occurs in discriminating a state in which the cheek muscle of a person moves up when he/she smiles as shown in FIG. 1A and a state in which the eyes of the person are half closed when he/she blinks his/her eyes as shown in FIG. 1C. In this case, both the facial expressions have short distances between the upper and lower eyelids and similar eye shapes. This makes it difficult to discriminate the facial expressions.
For this reason, if, for example, an image sensing apparatus such as a digital camera is equipped with a technique of detecting the facial expression of a face (e.g., the shapes of the eyes) to determine the images shown in FIG. 1B and FIG. 1C as eye closed images, even the image shown in FIG. 1A may be mistaken as a failed image, resulting in a non-captured image. Assume that when a person fully opens his/her eyes, it is determined that he/she is smiling. In this case, it is possible that even the image shown in FIG. 2 be captured.
It is very difficult to determine the image shown in FIG. 2 as a crying face or an embarrassed face, because there are individual differences. Under the circumstances, assume that a facial expression is determined based on the maximum value of facial expression evaluation values output from a plurality of discriminators. In this case, since evaluation values on crying facial expressions are often slightly higher than those on embarrassed facial expressions, it is possible that even an actually embarrassed facial expression is determined as a crying facial expression. In the case of infants, the probability of crying facial expressions is overwhelmingly high. It is possible, however, that a crying facial expression is erroneously determined as an embarrassed facial expression.
Furthermore, in an arrangement designed to uniformly perform calculation for facial expression evaluation values on each facial expression in facial expression determination, calculation for facial expression evaluation values on a plurality of facial expressions is performed for even a face with his/her eyes obviously looking closed, as shown in FIG. 1B, resulting in wasteful processing.