1. Field of the Invention
The present invention relates to a face image processing apparatus, a face image processing method, and a computer program for recognizing a face image included in a photographic image such as a still picture, a frame of moving images. In particular, the invention relates to a face image processing apparatus, a face image processing method, and a computer program in which feature points or features of an image of interest are checked against a registered image to identify a person.
More particularly, the invention relates to a face image processing apparatus, a face image processing method, and a computer program in which feature points or features used for personal authentication are selected through statistical learning to allow a personal authentication process to be performed using the selected features in a synthetic manner. Specifically, the invention relates to a face image processing apparatus, a face image processing method, and a computer program in which feature points on a registered image and an image to be checked are accurately associated with each other to achieve high recognition performance even when the pose of the image to be checked changes.
2. Description of the Related Art
Face recognition techniques can be widely used in applications of man-machine interface for purposes such as sex identification, a major application of this kind being personal authentication systems which do not bother users. Recently, face recognition is used for automating operations of a digital camera based on detection or recognition of an object, including automatic focusing (AF), automatic exposure (AE), automatic field angle setting, and automatic shooting.
For example, a face recognition system involves a face detection process for detecting the position and size of a face image included in an input image, a face parts detection process for detecting the positions of principal parts of the face from the detected face image, and a face identifying process for identifying the face image (or identifying the person) by checking an image to be checked obtained by correcting the position and rotation of the face image based on the positions of the face parts against a registered image.
Face recognition systems are known, in which feature points or features to be used for identifying a person are selected through statistical learning and in which a personal identification process is performed using the selected features in a synthetic manner (for example, see WO2003/019475 (Patent Document 1)). Features of a face image may be extracted using a plurality of Gabor filters having directional selectivity and different frequency components.
It has already been revealed that some visual cells of human-being exhibit selectivity to a certain direction, and a Gabor filter is a spatial filter formed by a plurality of filters which similarly have directional selectivity. A Gabor filter is spatially represented using a Gaussian function as a window and a Gabor function whose base is a sine function or a cosine function as frequency response. For example, the size of the filter window is fixed at 24×24 pixels. Forty types of Gabor filters are formed when there are five different frequencies f and eight angles θ.
Gabor filter calculations are performed using the forty types of Gabor filters at the maximum provided by switching the frequencies f and the angles θ. The forty sets of scalar values at the maximum thus obtained are referred to as “Gabor jets”. A Gabor jet is obtained as a local feature at each of feature extracting positions detected at predetermined intervals in the horizontal and vertical directions of face image data. Gabor jests are characterized in that they are robust against a certain degree of displacement or deformation of feature extracting positions.
For a registered face image, a Gabor jet is calculated in advance at each feature extracting position of the image. Degrees of similarity between Gabor jets of an input face and Gabor jets of a registered face at the same feature extracting positions are calculated to obtain similarity vectors which are sets of degrees of similarity at a plurality of feature extracting positions. Then, the vectors are classified by a support vector machine (SVM) to recognize the image to be checked and the registered image. In the related industry, support vector machines are considered as having the highest capability of generalized learning in the field of pattern recognition.
A face recognition system which solves the problem of properly selecting feature points or features used for personal identification using statistical learning as described above is advantageous in that a large number of feature points or features useful for identification are automatically selected. Further, a Gabor filter is robust against a certain degree of displacement or deformation of feature extracting points. Therefore, changes in the pose of an image to be checked included in an input image can be properly treated by preparing learning samples including some pose changes such that robust features will be selected.
However, when there is a significant change in the pose of an image to be checked, displacement of feature points may become too great to be absorbed by the robustness of a Gabor filter. When a face is identified (a person is checked) from an image, it is quite important to associate points on a registered image with the image to be checked properly in order to achieve high recognition performance.
For the purpose of associating feature points on an image to be checked having pose changes with a registered image properly, a method employing a graphic structure referred to as “elastic graph” for expanding and contracting the shape of a face has been proposed (for example, see Laurenz Wiscott, Jean-Marc Fellous, Norbert Kruger, and Christoph von der Malsburg, “Face Recognition by Elastic Bunch Graph Matching” (In Intelligent Biometric Techniques in Fingerprint and Face Recognition, CRC Press, ISBN0-8493-2055-0, Chapter 11, pp. 355-396, 1999) (Non-Patent Document 1)). According to this method, feature points are provided at nodes of a graph, and features associated with the nodes are stored in advance. The entire graph can be moved to find a position where the highest degree of matching of features takes place, and the positions of the nodes can be locally shifted to adjust the shape of the graph. Constraints can be imposed on extending and contracting amounts of branches of the graph, and it is therefore possible to absorb a difference attributable to a change in the pose of a face of interest or a personal difference without significantly departing from the shape of the face. When a person is checked, it is determined whether an image to be checked represents the same person appearing on a registered image using degrees of similarity of features at nodes of the images and displacements of the nodes from the initial positions.
One method of estimating the positions of parts of a face is the use of an AAM (Active Appearance Model). According to this method, a multiplicity of manually labeled part points (feature points) of various persons and poses are prepared in advance, and a principal component analysis is carried out on data that is a combination of the positions of the parts and images around them to learn variations of the positions of the parts and the patterns. When the position of apart is estimated from an input image, an initial position of the part is given and mapped along with a learned image around the same into a partial space. A learned variation has a higher degree of match, the smaller the distance to the partial space. Thus, a part position having a higher degree of match is calculated by changing the parameters of the mapped space minutely, whereby the corresponding part position can be identified. This technique can be regarded as a statistic model in that statistical constraints are imposed, whereas an elastic graph a described above is a two-dimensional geometrical model. Identification of a person can be normally performed by directly comparing parameters on such a partial space. The parameters include position and pattern variations.
Elastic graphs and AAMs are approaches which are essentially similar to each other only except different constraints are employed. However, those methods include no explicit step of deciding a node position at which feature is to be checked. Correspondence between feature points in different images can be more easily identified, the smaller the personal variation of the feature points. However, this is contradictory to the fact that a feature allows easier determination in actually checking differences between persons when the feature varies more significantly from person to person.
When correspondence between feature points is considered from the view point of personal identification, in the case of identification of one person, it is desirable that a successful match of a relationship between particular points of the face occurs independently of differences in the shooting situation such as differences in the pose of the face. However, a difference between corresponding points does not matter in the case of identification between different persons. Since the position of the same feature point can vary from person to person, a difference between corresponding points is rather preferable, and such a difference results in a pattern difference which is assumed to make personal identification easier.
According to the above-described methods employing an elastic graph and an AAM, the pose of a face of interest and personal differences are estimated at a time. When it is required only to check corresponding points in images of one and the same person, only the pose of the person is to be considered. When corresponding points are to be checked by estimating the pose of the face of interest only, a three-dimensional model of the shape of the face may be used.
An example of face recognition using a three-dimensional model is the method utilizing CG (computer graphics) techniques, proposed by Blanz et al. (see Volker Blanz and Thomas Vetter, “Face Recognition Based on Fitting a 3D Morphable Model”, (IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 25, No. 9, 2003) (Non-Patent Document 2), for example). According to the method, a principal component analysis (PCA) is carried out to provide a three-dimensional statistical model using three-dimensional shape data [x, y, z] of a great number of faces and textures (R, G, B) associated with the shape data obtained in advance under homogeneous illumination. An image that is close to a finally input face is synthesized by varying parameters of the three-dimensional model, pose parameters, and illumination parameters (CG techniques are used for the synthesis). The face identification itself is carried out using only the parameters of the three-dimensional model, and the identification is therefore carried out while eliminating the influence of the face pose and illumination.
There are also proposals on methods of synthesizing various faces by pasting registered frontal face images on a three-dimensional face model and adding the model with various variations in illumination and pose which can be assumed to occur in advance (see Akira Inoue, Shizuo Sakamoto, and Atsushi Sato, “Face Matching Using Partial Area Matching and Perturbative Space Method” (Proceedings of JEICE General Conference 2003) (Non-Patent Document 3), for example). A principal component analysis (PCA) is performed on all of the images to obtain partial spaces (perturbative partial spaces) that the images can occupy. The distance between an input image and the partial space of each person thus obtained is found to identify the person.
Any of the above-described methods of face recognition utilizing a three-dimensional model is characterized in that it involves synthesis of a face image which results in a considerable processing load and computational cost.