1. Field of the Invention
The present invention relates to an image processing apparatus and an image transformation processing method.
2. Description of the Related Art
Conventionally, identification technology for identifying people based on physical features such as fingerprints, palm prints, veins, and irises, so-called biometric authentication technology has been developed. Biometric authentication technology includes many processes that are performed using images acquired by a photoelectric conversion imaging apparatus, such as a digital camera, or data that has been converted to two-dimensional space data corresponding to an image. Face authentication technology using face images, among others, is one of the technologies that is attracting particular attention because face authentication technology involves the same actions as those usually performed by people when identifying other people, and there is less of a sense of resistance to face authentication technology than in other authentication technologies such as fingerprint authentication technology.
One of the problems that arises when identifying individuals using face images or the like is that variations due to other factors are larger than inter-individual variations. In other words, even if images are captured from the same person, the images may often be determined to be of different persons due to variations caused by lighting conditions, facial expressions, facial orientations, accessories such as glasses, cosmetics and so on, that is, the images may be determined to be similar to variation images of another person under the same conditions. For this reason, it can be said that it is very difficult to extract and classify only inter-individual differences while ignoring photographic conditions and other variations.
In order to cope with this problem, as a conventional technique, a method has been proposed that focuses on local regions in a face image. Even when variations as described above occur in the data of a plurality of face images obtained by capturing an individual, the influence does not always appear uniformly in the entire face region. For example, even when there are changes in facial expressions, there are few variations around the nose. If strong light is incident obliquely, the irradiated portion that is not in the shade exhibits few variations. Also, when the face turns to the left with respect to the viewer, because the face has a three-dimensional shape, the right side portion exhibits fewer variations than the left side portion. Accordingly, even when some local regions have large variations, it can be expected that the other local regions have only variations with which individual identification is possible. In other words, it is considered that good individual identification is possible by selectively integrating similarities of local regions that have relatively few variations.
However, the method that selectively uses local regions uses only a part of information appearing in an image, which is disadvantageous in terms of identification accuracy. Furthermore, variations do take place even in local regions that have relatively few variations, and thus the similarity is lowered when the conditions are different from those when registered images were taken.
As a method for resolving this problem and improving identification accuracy, “Learning Patch Correspondences for Improved Viewpoint Invariant Face Recognition”, A. B. Ashraf, S. Lucey, T. Chen, Carnegie Mellon University, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2008 (hereinafter referred to as Non-Patent Document 1) discloses a face recognition method in which each local region is individually affine-transformed according to facial orientation attributes, and thereafter similarity is calculated. For example, frontal images that have been normalized in advance are used as registered images, and images of a plurality of local regions that are rectangular regions provided at predetermined positions are held as registration data. If an input image to be identified is, for example, an image in which the face has been turned back to the left by approximately 30 degrees, each of extracted rhombic local regions is deformed into a frontal face orientation-equivalent rectangular shape through affine transformation, and thereafter similarity between the local region and a registered local region having the same rectangular shape is calculated. The similarity calculated here is the sum of differences between pixels at the same position.
In order to determine the rhombic shape of an extracted local region from the image in which the face has been turned back to the left by approximately 30 degrees, and deform it to a frontal face orientation-equivalent rectangular shape, different affine parameters are learnt in advance for each local region by using a large number of pair images. In Non-Patent Document 1, so-called Lucas-Kanade method (B. Lucasand T. Kanade, “An iterative image registration technique with an application to stereo vision”, Proceedings of Imaging Understanding Workshop, page 121-130 (hereinafter referred to as Non-Patent Document 2)) is used to estimate affine parameters. This estimation can be performed online in units of pairs of an individual input image and a registered image, but it has been reported that identification accuracy is rather improved by using parameters that have been learnt in advance as average parameters by using a large number of sample pair images. This is presumably because the use of average parameters eliminates errors due to individual noise and the like. The use of parameters prepared in advance is of course advantageous in terms of processing speed compared to the case where estimation processing is performed online in units of pairs of an individual input image and a registered image.
Also, “Face Recognition based on Local Region Extraction according to Facial Orientation”, Ijiri Yoshihisa, et al., Proceedings of 13th Symposium on Sensing via Image Information, Yokohama, June 2007 (hereinafter referred to as Non-Patent Document 3) discloses a face recognition method in which the position of a local region and the extraction size are set with reference to detected feature points. Extracted rectangular local regions are normalized to a normal size, and their similarity is calculated. The feature points serving as the reference points are points that can be detected relatively easily, such as the left end (outer corner) of the left eye. Then, the positions of local regions are determined by relative coordinates (a, b) in predetermined horizontal axis X direction and vertical axis y direction from the detected reference points. At this time, in order for the local regions to be always at substantially the same positions, it is effective to change the relative coordinate values according to the facial orientation. Furthermore, in order for extracted local regions to be within substantially the same face region, scale c may also be changed according to the facial orientation.
In Non-Patent Document 3, facial orientation estimation is performed using position information regarding a plurality of detected feature points, and parameters learnt in advance are selected according to the estimated facial orientation. In the case of the face facing the front, for example, regions are extracted using parameters a1, b1, and c1, and in the case of the face turning to the left, the same regions are extracted using different parameters a2, b2, and c2.
Japanese Patent Laid-Open No. 2007-34724 (hereinafter referred to as Patent Document 1) discloses a method in which an entire face image to be processed is morphed into a reference face image (average face) with the use of deformation vectors. The deformation vectors are calculated based on correspondence points between the image to be processed and the reference image, and are considered to be the same as the relative coordinates that represent a point in the reference image corresponding to a point in the image to be processed. The deformation of a face image in Patent Document 1 is performed for the purpose of generating a facial caricature.
As disclosed in Non-Patent Document 1, it has been proven that the deformation processing, in which each local region is deformed into a normal state as much as possible, performed prior to identification processing has certain effects in improving the identification ratio. For example, situations often arise in which one wants to perform deformation processing and spatial filter processing as pre-processing for image compression processing or the like, not only recognition processing. In such a case, the methods described above have the following problems.
According to the method disclosed in Non-Patent Document 1, the technique for deforming local regions is limited to affine transformation. With affine transformation, it is possible to correct orientation variations in a plane. However, because human faces have a complex shape, the errors created by treating local regions as planes are not small. Furthermore, in addition to the problem of orientation variations, this method has the problem in that it is not possible to correct physical shape deformations due to changes in facial expressions.
In addition, it is necessary to perform deformation processing prior to similarity calculation using registered images, which increases processing loads. The Lucas-Kanade method is originally a technique for finding a correspondence point between two images, and therefore it is possible to represent arbitrary deformations as a correspondence point list, for example, by extending the technique of this document, but in this case as well, problems of increased processing loads and increased parameters arise.
According to the method of Non-Patent Document 3, although scaling is performed in units of local regions, extraction is always performed in the shape of a rectangle, and the shape is not deformed, and therefore there is a limit to the improvement of similarity for each region. There are also concerns over the increase of processing loads due to the additional scaling processing.
The method for deforming a face image according to Patent Document 1 is deformation processing performed on an entire face image, but this method can be applied as pre-processing for face identification processing. It is also conceivable to perform deformation in units of local regions in the same manner. In this case, a deformation vector (or in other words, correspondence point) is attached to each representative feature point, and the pixels between representative feature points are deformed while being interpolated. In this case as well, the processing loads are not negligible. In addition, in order to flexibly cope with various kinds of deformations, it is necessary to increase the number of feature points to which deformation vectors are attached, and in this case, the increase of memory for storing parameters is the problem.
Also, a situation can arise in which one wants to perform as pre-processing for similarity calculation for each region, for example, not only deformation processing, but also filter processing such as blurring processing using a Gaussian filter or the like, and edge extraction processing. However, all of the above-described techniques require addition of filter processing, and thus are problematic in that the size and processing loads of the processing apparatus increase.