There are some methods of recognizing an element including a kind of a subject included in an image, a state of the subject (a pose, a facial expression, or the like), and an environment under which the image was photographed (illumination conditions, a type of a camera, or the like). As one of those methods, there has been known a method of estimating a similarity between an image of a recognition target and a reference image of a comparison target to thereby judge the sameness between those images.
In a case where elements other than an element sought to be recognized are different between the recognition target image and the reference image, however, if variations of images caused by such other elements become greater than variations resulting from the element sought to be recognized, it is difficult to recognize the image as being the same as the reference image even in a case where the element to be recognized is actually the same for both images. Thus, there is a problem that the recognition performance is degraded.
For example, in a face collation system for performing personal identification with inputs of facial images of persons, an element sought to be recognized includes an individuality of a face, and other image variable elements include a pose of a face, illumination conditions and the like. If variations of image luminance values that are caused by changes of the pose or illumination become greater than variations of image luminance values that are caused by individual differences, miscollation is caused such that image data of the same person are judged as image data of different persons.
In order to improve those problems, various methods have been proposed to utilize an image variation model which is produced by modeling the relationship between variable elements that may cause image variations and their resultant image variations based on a prepared group of learning data or the like and which is capable of generating an image under conditions for which elements are given as parameters. Specifically, a model fitting process is performed to estimate values of parameters which generate from the image variation model an image that is closest to a given input image. The recognition is performed based on a value of a parameter sought to be recognized among the obtained parameters.
An example of the aforementioned technique is disclosed by Prior Art Reference 2 listed below. This technique produces a model capable of generating a three-dimensional facial model with elements for individual differences, i.e., parameters representing an individuality of a three-dimensional shape and a texture of a face, based on three-dimensional facial models of many persons which have previously been collected.
Furthermore, with parameters representing a pose of a face and illumination conditions, a facial image under the conditions represented by the parameters for the pose and illumination is generated from the three-dimensional facial model data by using computer graphics. The aforementioned model is an example of an image variation model capable of generating a facial image under conditions in which an individuality, a pose of a face, and illumination conditions are given as parameters and is referred to as a facial morphable model (hereinafter abbreviated as MM).
Prior Art Reference 2 discloses a technique of fitting MM to each of a reference image (hereinafter referred to as a gallery image) and a recognition target image (hereinafter referred to as a probe image) to estimate parameters representing elements of an individuality, a pose, and illumination, and of collating both images based on a group of the estimated parameters. Specifically, the following two techniques are disclosed.
(1) Among the group of the estimated parameters, a similarity only between target parameters is computed for collation. (2) The value of a target parameter is set to be a value obtained from a recognition target image, and the values of other parameters are set to be values estimated from a registered image. An image at that time is generated as a comparison image from MM. A similarity between the generated image and the registered image is computed for collation.
The technique (1) is configured as shown in FIG. 1. In FIG. 1, a model fitting means 1001 determines parameters of an image variation model 1003 so that a generated image becomes most similar to a gallery image Ig. Among a series of parameters determined at that time, a parameter corresponding to an object of the collation is defined as a target parameter ptg, and other parameters are defined as an external parameter peg. A model image I′g is an image generated at that time. For example, in a case where an individual face is to be collated, the target parameter ptg includes a parameter for an individuality, and the external parameter peg includes parameters representing elements such as a pose and illumination. Similarly, a model fitting means 1002 determines a target parameter ptp and an external parameter pep of the image variation model 1003 so as to obtain a model image I′p that is most similar to a probe image Ip. Then a parameter collation means 1004 computes a similarity between the target parameters ptg and ptp. This computation result is defined as a similarity between the gallery image Ig and the probe image Ip. If there are a plurality of gallery images Ig, similar computations are performed for other gallery images. A gallery image Ig having the highest similarity is used as the collation result.
In the technique (2), while a value of the target parameter ptp computed from the probe image is used as the value of the target parameter and a value of the external parameter peg estimated from the gallery image is used as the values of other parameters, an image is generated as a comparison image from the image variation model 1003. A similarity between the comparison image and the original gallery image Ig is computed. This similarity is defined as a similarity between the gallery image Ig and the probe image Ip. If there are a plurality of gallery images Ig, similar computations are performed for other gallery images. A gallery image Ig having the highest similarity is used as the collation result.                Prior Art Reference 1: Japanese laid-open patent publication No. 2002-157595        Prior Art Reference 2: Volker Blanz, Thomas Vetter, “Face Recognition Based on Fitting a 3D Morphable Model,” IEEE Trans. PAMI, vol. 25, no. 9, pp. 1063-1074, 2003        Prior Art Reference 3: Sami Romdhani, Volker Blanz, Thomas Vetter, “Face Identification by Fitting a 3D Morphable Model using Linear Shape and Texture Error Functions,” Proc. ECCV 2002, pp. 3-19, 2002        Prior Art Reference 4: Andreas Lanitis, Chris J. Taylor, Timothy F. Cootes, “Automatic Interpretation and Coding of Face Images Using Flexible Models,” IEEE Trans. PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997        