Various techniques have been proposed for estimating an orientation (to be referred to hereafter as an attitude) of an individual (a human face, for example) and recognizing the individual from an image of the individual.
Procedures such as the following may be executed as typical means for estimating the attitude of an individual such as a face and recognizing the individual using a feature point.    (1) A database of a three-dimensional shape model (to be referred to hereafter as a 3D shape model) of an object serving as a recognition subject is constructed. In many cases, a three-dimensional shape is measured using a three-dimensional measurement apparatus (a 3D scanner), and a database of a 3D shape model obtained as a result is constructed.    (2) A specific site of an object to be employed as a feature point for use during recognition is determined as a feature point on the 3D shape model.    (3) A feature point extractor for extracting respective feature points from an image is constructed. In other words, learning is performed in the feature point extractor. When a certain pattern is input into the feature point extractor, the feature point extractor generates internal data for determining whether or not the pattern includes a feature point from feature points and non-feature points known in advance. Generation of this type of internal data is known as learning.    (4) The feature point extractor is used to extract a feature point from an image on which a recognition task such as attitude estimation or individual recognition is to be performed. The recognition task is then performed using a correspondence relationship between a position of the feature point extracted from the image and a position of the feature point on the 3D shape model.
Non-Patent Document 2, for example, describes a method of generating a feature point on the basis of entropy as a technique that can be used in procedure (2). A process of determining a feature point from a state in which a feature point is not determined will be referred to as feature point generation.
Non-Patent Document 1, for example, describes a technique of using a SIFT (Scale-Invariant Feature Transform) algorithm to extract a feature point required to determine a corresponding point between images as a technique for constructing a feature point extractor that can be used in procedure (3). The SIFT algorithm makes it possible to detect a mass through multi-resolution analysis and associate images using a grayscale histogram.
As another technique for constructing a feature point extractor, mechanical learning may be performed in advance in relation to an image pattern to be extracted as a feature point, whereupon a recognition determination is performed on the pattern using the learning result. Non-Patent Document 3 describes the use of GLVQ (Generalized Learning Vector Quantization) in the learning and the determination. In Non-Patent Document 3, pattern detection is performed on a face, but by switching the pattern from the face to a feature point periphery part, a feature point can be detected. An SVM (Support Vector Machine) is also known as mechanical learning means.
The feature points generated by the techniques described in Non-Patent Documents 1 and 2 may be used in the facial attitude estimation and face recognition operations described above and so on, for example.
Non-Patent Document 1: Hironobu FUJIYOSHI, “Gradient-Based Feature Extraction “SIFT and HOG””, Research Paper of Information Processing Society of Japan CVIM 160, pp. 211-224, 2007
Non-Patent Document 2: Joshua Cates, Miriah Meyer, P. Thomas Fletcher, Ross Whitaker, “Entropy-Based Particle Systems for Shape Correspondence”, Proceedings of the MICCAI, 2006
Non-Patent Document 3: Toshinori HOSOI, Tetsuaki SUZUKI, Atsushi SATO, “Face Detection using Generalized Learning Vector Quantization”, Technical Report of IEICE. PRMU, Vol. 102, No. 651 (20030213), pp. 47-52
The inventor of the present invention, having investigated requirements to be satisfied by a feature point for use in various types of processing such as attitude estimation and individual recognition, found that the feature point should satisfy the following three requirements.
A first requirement (to be referred to hereafter as a requirement A) is that a feature point extractor capable of extracting a feature point position from a recognition subject image reliably even when the lighting and attitude of the image change can be constructed. More specifically, the requirement A is that when a feature point and a point other than the feature point are input into the feature point extractor and then a certain pattern is input, the feature point extractor can be caused to learn internal data for determining whether or not the pattern includes the feature point. For example, a similar image pattern is obtained in all positions of a cheek region, and therefore, when a single point on the cheek is set as the feature point, a point in a different position of the cheek to the feature point has a similar image pattern, making it difficult for the feature point extractor to extract the feature point. Hence, it may be said that a cheek point does not satisfy the requirement A.
A second requirement (to be referred to hereafter as a requirement B) is that the feature points of different individuals correspond. For example, when an eye corner point is set as the feature point, the eye corner point of a person X corresponds to the eye corner point of a person Y. Hence, the eye corner point satisfies the requirement B. The requirement B may be further divided into two requirements. One is that when all 3D shape models are disposed in overlapping fashion on an identical coordinate system, the feature points are positioned close to each other on the 3D shape model. This requirement will be referred to as a requirement B1. Another is that local region patterns (local patterns) on the periphery of the feature point, which are cut out from images and include the feature point, are similar between the images. This requirement will be referred to as a requirement B2. To describe the eye corner as an example, when the 3D shape models of the persons X and Y are overlapped, the eye corner points are close to each other, and therefore the requirement B1 is satisfied. Further, when the eye corner point and the periphery thereof are cut out from facial images of different people, the cut out parts resemble each other, and therefore the eye corner point also satisfies the requirement B2.
A third requirement (to be referred to hereafter as a requirement C) is that the feature point covers points that are important to the recognition task such that by using the feature point, a sufficiently high recognition performance can be realized. As an example of this condition, the requirement C may be said to be satisfied when the feature point is extracted uniformly from the entire 3D shape model rather than being concentrated in a part of the 3D shape model.
First means and second means described below may also be employed as means for generating a feature point on a 3D shape model. The first means is a method of extracting feature points using a typical feature point extractor and selecting a feature point that satisfies the requirements A to C from the extracted feature points. A feature point extractor that determines whether or not the feature point corresponds to a typical pattern such as a corner (an angle), for example, may be used as the typical feature point extractor.
Further, a feature point extractor employing the SIFT algorithm may be applied to the first means. For example, respective images obtained by variously changing the lighting condition of a 3D shape model are prepared, and using the feature point extractor employing the SIFT algorithm, feature point extraction and feature amount calculation are performed on the respective images. A feature point that satisfies the requirements A to C may then be selected from the extracted feature points. To select a feature point that satisfies the requirement A, for example, respective images obtained by varying the lighting on respective individuals (individual faces, for example) are prepared, and using the feature point extractor employing the SIFT algorithm, feature point extraction and feature amount calculation are performed on the respective images. Processing is then performed on each individual to associate feature points having close feature amounts with each other, calculate an average of the positions of the associated feature points within the image, and determine a deviation of the individual feature points from the average position. A feature point that satisfies the requirement A is obtained by selecting a feature point having a small positional deviation from the average with regard to a certain single individual. Further, when the average position of the associated feature points is determined for each individual, an average of the average positions of the respective individuals is determined, and a feature point that satisfies the requirement B is obtained by selecting a feature point in which a deviation between this average (i.e. an inter-individual average) and the average position between the images of the respective individuals is small. A feature point that satisfies the requirements A to C is then obtained by selecting a combination of feature points that are distributed uniformly over the entire 3D shape model from the feature points satisfying the requirements A and B.
A method of selecting a feature point that satisfies a desired condition in advance, causing the feature point extractor to perform specialized learning on the feature point, and then performing feature point extraction using the feature point extractor may be cited as the second means for generating a feature point on a 3D shape model. When selecting the feature point that satisfies the desired condition, a method of extracting the feature point on the basis of entropy, as described in Non-Patent Document 2, may be used. In the method described in Non-Patent Document 2, a feature point is selected from the entire 3D shape model. Further, feature points existing in similar positions on different individuals are selected. In other words, feature points that satisfy the requirements B1 and C are extracted. A plurality of local patterns of these feature points and a plurality of local patterns of points that do not correspond to the feature points are then input into the feature point extractor, whereupon the feature point extractor is caused to perform learning such that thereafter, the feature point extractor can extract similar feature points. The GLVQ and SVM described in Non-Patent Document 3 may be used during construction (learning) of the feature point extractor.
When a feature point is generated using the first means, first, feature points are extracted by a certain feature point extractor, whereupon a feature point that satisfies the requirements A to C is selected therefrom. Hence, with the first means, the generated feature points are limited to feature points that can be extracted favorably by the feature point extractor prepared in advance. It cannot therefore be said that the obtained feature point is the most suitable feature point for performing recognition. For example, even when a feature point that satisfies the requirements A to C is selected from the feature points extracted by the feature point extractor employing the SIFT algorithm, a more important feature point for recognition that could not be extracted using the SIFT feature point extractor may exist. It is therefore impossible to clarify whether or not the obtained feature point is the optimum feature point for obtaining a high recognition performance.
When a method of extracting a feature point on the basis of entropy (the method described in Non-Patent Document 2) is applied to the second means, a feature point that exists in similar positions on different individuals and is obtained uniformly from the entire 3D shape model is extracted. The feature point extractor is then caused to perform learning for generating this type of feature point. Hence, a feature point that satisfies the requirements B1 and C can be generated. However, the ease of the learning performed by the feature point extractor (requirement A) and the similarity of the local patterns among the different individuals (requirement B2) are not taken into account, and therefore a feature point that leads to deterioration of the feature point extraction precision is generated. This type of feature point is ineffective during recognition, and therefore the effectiveness of the generated feature point during recognition cannot be said to be high.