1. Field of the Invention
The present invention relates to an image processing apparatus, image processing method and computer program. More particularly, the present invention relates to an image processing apparatus, image processing method and computer program which execute normalization processing of a face image included in captured image data, for example.
2. Description of the Related Art
Techniques for recognizing a face can be widely applied to man-machine interfaces, for example, to a user-friendly individual authentication system, gender identification, and the like. In the initial stages, recognition techniques using a side face were studied, but recognition techniques using a front image have currently become major techniques.
In a face recognition system, two kinds of processing are performed. One is face extraction processing which extracts a face pattern on the basis of feature information of a face from a captured image by a CCD camera, etc., and face identification processing which compares an extracted face pattern and a registered face image in order to identify a face of a specific person. For the face extraction processing and the face identification processing, it is possible to apply, for example, Gabor filtering which extracts a feature quantity of a face image using a plurality of filters having orientation selectivity and different frequency components (For example, refer to Japanese Unexamined Patent Application Publication No. 2006-4041).
It has already been proven that there are some cells having selectivity for a specific orientation in human visual cells. These include cells that respond to a vertical line and cells that respond to a horizontal line. A Gabor filter is a spatial filter including a plurality of filters having orientation selectivity in the same manner as the above.
A Gabor filter is spatially expressed by a Gabor function using a Gaussian function for a window and a sine function or a cosine function as a basis for a frequency response. The size of a filter window is, for example fixed to 24×24. Also, assuming that a frequency f has five types and an angle θ has eight directions, 40 types of Gabor filters are constituted.
The operation of a Gabor filter is the convolution of a pixel to which the Gabor filter is applied and the coefficients of the Gabor filter. The coefficients of a Gabor filter can be separated into the real part having a cosine function as a frequency response, and the imaginary part having a sine function as a frequency response. The convolution is applied to each of the above, and individual components are combined into a Gabor filter result including one scalar value. Such an operation is applied to maximum 40 types of Gabor filters while changing a frequency f and an angle θ. The obtained tuple of maximum 40 scalar values is called “Gabor jet”. The Gabor jet as a local feature quantity is obtained for each feature-quantity extraction position detected at regular intervals in the horizontal direction and in the vertical direction on face image data. A Gabor jet has a characteristic which is invariable against a certain degree of shift and deformation of a feature-quantity extraction position.
In the face identification processing for identifying a specific person face, a comparison is made between the extracted face pattern and the registered face images. For the registered face images, the Gabor jets have been calculated for each feature-quantity extraction position in advance. Then the similarities of the Gabor jets of the input face and the Gabor jets of the registered face at the same feature-quantity extraction position are calculated, and a similarity vector, which is a set of the similarities at a plurality of feature-quantity extraction positions, is calculated. Next, class determination by a support vector machine (SVM) is carried out, and the input face image and the registered face image are recognized. The support vector machine calculates the value of the boundary surface of the similarity vector, for example the distance from the boundary surface (the surface of the position whose value is 0) to be determined to be +1 or −1, and determines which of the intra-personal class or the extra-personal class the similarity vector belongs to. If there is no similarity vector that is determined to belong to the intra-personal class, it is determined that an unrecorded person's face is input (for example, refer to B. sholkopf, et al., “Advance in Kernel Support Vector Learning” (The MIT Press, 1999.)). Also, one support vector machine learns (that is to say, registers) a lot of face images, thereby making it possible to determine whether a newly input face image matches a registered (learned) face image (belongs to the intra-personal class) or does not match (belongs to the extra-personal class) (for example, refer to Domestic Re-publication of PCT International Publication No. WO03/019475 and Japanese Unexamined Patent Application Publication No. 2006-4003). The support vector machine is evaluated to have the highest learning generalization ability in the field of pattern recognition by those skilled in the art.
As described above, in a face recognition system, two kinds of processing are performed: face extraction processing which extracts a face pattern on the basis of feature information of a face from a captured image by a CCD camera, etc., and face identification processing which compares an extracted face pattern and a registered face image in order to identify a face of a specific person. In the face identification processing, fitting processing on the face image extracted from an input image is performed as pre-processing of comparison processing in order to allow correct comparison processing with a registered image.
In the fitting processing, for example a face image recorded in a first memory is picked up and is subjected to normalization processing to be recorded into a second memory. Specifically, for example both-eye positions of a face image in the first memory are detected, the size of the face, the position of the face, and the angle of the face are obtained from the detected both-eye position information, and the face image in the first memory is contracted, shifted, and rotation converted such that the positions of the right eye and the left eye match fixed coordinates in the second memory in order to create a face image necessary for face recognition in the second memory as the fitting processing.
The face identification processing by the above-described Gabor filter is applied to an image after this fitting processing to achieve correct processing. That is to say, the similarities between the Gabor jet at the same feature-quantity extraction position of an input face and the Gabor jet of a registered face is calculated, a similarity vector, which is a set of the similarities at a plurality of feature-quantity extraction positions is obtained, and a class determination by a support vector machine (SVM) is made to perform face identification based on the matching between the input face image and a registered face image.
In the normalization processing as the above-described fitting processing, the normalization processing in which image conversion including contraction, shifting, and rotation processing is performed on a face image in the first memory, and the execution result is created in the second memory. In this image conversion processing, in general, processing is performed using an origin fixed on each memory as a center. For example, specifically, affine transformation is performed by setting the origin coordinates of the first memory and the origin coordinates of the second as center coordinates to perform memory image conversion including contraction, shifting, and rotation processing.
However, for example, when image conversion by affine transformation is performed, as a distance from an origin becomes larger, an error is apt to occur more often. On the other hand, the above-described determination of similarities by the Gabor jets, which is performed as the determination processing of the similarities between an input face image and a registered face image in face identification processing, is performed as determination processing based on the similarities of the positions and shapes of mainly face parts, that is to say, face parts such as eyes, a nose, a mouth, etc.
Accordingly, if the image conversion processing by affine transformation is performed, for example using an origin that is set outside of a face image as a center, an error occurs in the position or the shape of face parts, such as eyes, a nose, a mouth, etc., which is important information for extracting feature quantity to become important for face identification processing, and the determination of the similarities with a registered face image might not be performed correctly.