1. Field of the Invention
The present invention relates to a data correction method and apparatus that correct, for example, the position of a feature point or image data for use in, for example, image recognition.
2. Description of the Related Art
In, for example, face recognition that uses image data, decision of the position of a face organ or a characteristic part (to be referred to as a feature point hereinafter) is an important task, which often governs the recognition performance. The decided position of a feature point is used as, for example, a reference point in normalizing the size and rotation of an image to be recognized, and extracting a partial region necessary for recognition from this image. To calculate a feature amount suitable for recognition, it is desirable to precisely decide the position of a feature point.
Japanese PCT National Publication No. 2002-511617 (to be referred to as patent reference 1 hereinafter) describes a technique associated with face detection by graph matching. According to patent reference 1, face detection is executed upon preparing constraints called a plurality of elastic bunch graphs corresponding to the face orientations, and the face orientation and the position of a feature point are decided from the detection result obtained by an optimum elastic bunch graph. R. Senaratne; S. Halgamuge. “Optimized Landmark Model Matching for Face Recognition” Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference, pp. 120-125 (to be referred to as non-patent reference 1 hereinafter) describes a method of projecting a plurality of feature point position coordinate data onto a dimensionally reduced subspace, and searching the subspace for the position of a feature point. Japanese Patent Laid-Open No. 2008-186247 (to be referred to as patent reference 2 hereinafter) describes a method of determining the face orientation based on an empirically obtained arithmetic expression from the position of a face organ.
Beumer, G. M.; Tao, Q.; Bazen, A. M.; Veldhuis, R. N. J. “A landmark paper in face recognition” Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference, pp. 73-78 (to be referred to as non-patent reference 2 hereinafter) describes a method of setting the coordinate values of each feature point as an input vector, and correcting the position of this feature point using a subspace. An overview of this method will be explained. First, the positions of feature points as shown in FIG. 16A are identified by feature point position candidate decision processing. FIG. 16A shows an example of the positions of feature points and exemplifies a case in which 14 feature points indicated by “×” marks (these feature points are defined as a set of feature points) are decided. For example, a feature point 1601 shown in FIG. 16A corresponds to the tail of the left eye. Next, the coordinate values of each feature point decided in the above-mentioned decision processing are set as an input vector and projected onto a subspace by subspace projection processing. When there are 14 feature points, as shown in FIG. 16A, a 28-dimensional vector (a vector containing 14 horizontal coordinate values and 14 vertical coordinate values as elements) is set as input data. The projection onto the subspace uses a projection matrix generated by, for example, principal component analysis using a plurality of learning feature point position data in advance.
Next, in dimension compression processing, the dimensionality is reduced by eliminating projection values corresponding to projection matrices which are obtained by the principal component analysis and have small eigenvalues. For example, a 28-dimensional vector is reduced to a several-dimensional vector. In subspace inverse projection processing, the input vector projected on the subspace is inversely projected onto a real space to obtain an inverse projection vector in this space using the dimensionally reduced projection vector and the projection matrix. With the foregoing processing, even an input vector with an outlier, which cannot be represented in a subspace, generated using a learning data set, is corrected to a vector which can be represented in the subspace. That is, an input vector is corrected based on a statistical geometric constraint that uses a subspace.
FIG. 16B shows an example in which an erroneous feature point is extracted in the feature point position candidate decision processing. A feature point 1602 exemplifies a feature point, which is erroneously decided as a correct feature point upon erroneously determining the eyebrow edge as the tail of the eye. When the above-mentioned subspace projection processing, dimension reduction processing, and subspace inverse projection processing are executed for an input vector containing the positions of feature points shown in FIG. 16B, the position of the feature point 1602 is corrected to that which can be represented in the subspace. FIG. 16C is a view showing an example of the positions of feature points after the subspace inverse projection processing when the feature point 1602 is corrected to a feature point 1603.
In distance calculation processing, the distance between the feature point candidate coordinates as the output in the feature point position candidate decision processing and the feature point coordinates corrected by a series of processing from the subspace projection processing to the subspace inverse projection processing is calculated for each feature point. In the case exemplified in FIGS. 16B and 16C, the Euclid distance between the feature points 1602 and 1603 in the image coordinate system is calculated. In selection processing, the distance of the feature point calculated in the distance calculation processing is compared with a threshold, and the coordinates of the feature point before or after the correction are selected for each feature point. In this case, the coordinate values after the correction are selected if the distance between the feature points before and after the correction exceeds a predetermined threshold, and those before the correction are selected if that distance is equal to or smaller than the threshold. The foregoing processing is repeated for each feature point. Also, re-execution of a series of processing from the subspace projection processing to the selection processing upon setting the set of feature points obtained as a result of the selection processing as an input vector is repeated a plurality of times, thereby deciding appropriate positions of the feature points.
Moreover, Toshiyuki Amano, Yukio Sato, “Image Interpolation Using BPLP Method on the Eigenspace” IEICE Trans. Vol. J85-D2, No. 3, pp. 457-465 (to be referred to as non-patent reference 3 hereinafter) describes a technique of processing and appropriately correct image data based on the same concept as in non-patent reference 2. Non-patent reference 3 describes a method of statistically interpolating defective data by projecting image data itself onto a low-dimensional subspace.
The method described in patent reference 1 requires a large amount of computation because the degree of matching between deformation of the elastic bunch graph and the feature amount is iteratively computed for each face orientation until the face is ultimately detected. Non-patent reference 1 searches the subspace for the position of a feature point using the particle swarm optimization method, thereby determining the position of a feature point with high accuracy although the method in this reference requires a smaller amount of computation than in that described in patent reference 1. Nevertheless, the method described in non-patent reference 1 still requires a large amount of computation because it is necessary to repeat decision of an organ position candidate and extraction of a feature amount at a position corresponding to this candidate. Also, the method described in non-patent reference 1 does not take into consideration a mechanism which copes with a large fluctuation of the face. Patent reference 2 describes a method of determining the face orientation based on a rule empirically obtained from the arrangement of organ positions. It is possible to determine and correct an error of the organ position detected as a candidate based on the arrangement of organ positions using the technique in patent reference 2. However, it is difficult to set a rule optimum for various types of fluctuations in the rule-based determination processing.
The geometric correction processing using a subspace, which is described in non-patent reference 2, is effective in allowing appropriate geometric constraint processing with a small amount of computation, but does not take into consideration the situation in which the face orientation/facial expression has large fluctuations. When a data group with a large fluctuation is added to the learning data of the subspace in order to cope with the fluctuations, the correction capability degrades. Similarly, non-patent reference 3 does not take into consideration the situation in which the target image has a large fluctuation.