1. Field of the Invention
The present invention relates to an effective technique applied to an apparatus, a method, and a program for detecting a feature point of a target object from an image.
2. Description of the Related Art
Conventionally, in a technique of detecting the feature point of a target object from an image, there is a technique of detecting the feature point of a face which is set to the target object. The background art in the case where the face is set to the target object will be described below.
For example, information regarding whether the eyes of a person are opened or closed, a face attitude, a direction of a visual line, and an expression of the face can be estimated by obtaining a feature point position of the face from an image in which the person is photographed. A state of the person can be acknowledged in detail from the estimation result. The detailed acknowledgement of the state of the person can achieve the performance improvement of a man-machine interface or provision of new service. Thus, in development of a man-machine interface, one important problem is to correctly know the feature point position of the face.
A driver monitoring system is an example of the man-machine interface. Opening and closing of the eyes of the driver, the face attitude, the direction of the visual line, and the like are sequentially observed in the driver monitoring system. A fatigue degree of the driver is determined based on the observation result, which allows appropriate advice to be given according to a situation.
Another example of the man-machine interface is the application of the man-machine interface to a moving picture video camera and a still image camera. In these devices, various kinds of processing such as a change in imaging condition can be performed by knowing the face attitude of a person to be photographed. Additionally, individuals can be identified with high accuracy by analyzing in detail the image of feature points such as the eyes and a mouth.
With respect to a method of obtaining the feature point position of the face from the image (hereinafter referred to as “feature point detection method”), a method in which template matching is utilized and applications thereof are usually used. In these methods, general information on each feature point (hereinafter referred to as “feature value”) is previously stored. A determination whether or not the feature point should be obtained is made by comparing a feature value obtained from a part of an area in the image to a stored feature value. A specific example of the feature value used at this point includes a brightness value vector of the image. Usually normalized correlation and a Euclidean distance are used in means for comparing the feature values.
In such techniques, there is proposed a technique of detecting a feature point candidate with pattern matching after the number of searching points are previously decreased by a separation filter (see Japanese Patent Laid-Open Patent No. 9-251534). In the technique disclosed in Japanese Patent Laid-Open Patent No. 9-251534, after the feature point candidate is detected, a geometrical constraint condition is applied to output a combination of candidate points which is determined as a most-likely human face.
However, in the technique disclosed in Japanese Patent Laid-Open Patent No. 9-251534, the feature points of the image of a person can be detected only under a good photographing condition in which the person is orientated toward a substantially frontward direction. Accordingly, it is difficult to correctly detect the feature points from the image in which a part of the feature points is hidden by an obstacle or the image which differs largely from the image including the previously stored feature points in the photographing condition (for example, lighting condition).
There is a technique of obtaining the feature point position of the face to estimate the face attitude (see Japanese Patent Laid-Open Patent No. 2000-97676 and Japanese Laid-Open Patent No. 2003-141551). In such techniques, usually, after the feature point position of the face is obtained, the estimation is performed using the whole of arrangement or the feature value. For example, a relationship between the face attitude and the coordinate of the feature point indicating the eyes, the mouth, eyebrows, or a nose is previously stored as a look-up table. The face attitude corresponding to the coordinate of the feature point obtained from the image is determined from the look-up table and outputted as the estimation result. Additionally, there is a technique in which templates of the whole of the face or the feature value of the face are prepared for plural orientations of the face and the face attitude is determined by matching with the templates. However, in these techniques, whether or not the face attitude can correctly be estimated depends on the accuracy of the feature point position of the face. Accordingly, the face attitude cannot correctly be estimated unless the feature point position of the face is correctly obtained.
In order to solve the problem, there is a technique called ASM (Active Shape Model) (see A. Lanitis, C. J. Taylor, T. F. Cootes, “Automatic Interpretation and Coding of Face Images Using Flexible Models.” IEEE PAMI Vol. 19 No. 7 pp. 743-756, July 1997). In ASM, the feature point position is previously obtained for a learning face image, and a face shape model is produced and retained. The face shape model is formed by nodes corresponding to the face feature points. The detailed face shape model will be described below and processing for detecting the feature point position from the image with ASM will be described.
The face shape model is arranged at proper initial positions of the image which becomes the processing target. Then, the feature values are obtained around each node of the face shape model. In each node, the obtained feature value is compared to the feature value which is retained while previously corresponding to the node. As a result of the comparison, the feature value which is closest to the feature value corresponding to each node is moved to the obtained position (i.e., it is determined that the position has the highest possibility of the feature point corresponding to each node). At this point, the positions of the nodes of the face shape model are displaced from the initial positions. Therefore, a set of the deformed nodes is shaped by projecting the set of the deformed nodes to the face shape model. The processes from obtaining the feature value around each node are repeated a predetermined number of times or until a predetermined condition (convergence condition) is satisfied. Then, it is determined that the final position of each node is the position of each feature point.
Thus, in ASM, the projection is performed to the face shape model after the position of each node is moved. The feature point position can correctly be detected through the processing while the positional relationship among the nodes maintains the face-like shape. That is, even if a portion having the feature value similar to that of the feature point exists accidentally at the position located not far away from the shape of the general human face, the wrong detection in which the point is detected as the feature point can be prevented.
However, in the conventional ASM, it is necessary to perform searching processing (obtaining the feature value and comparison thereof around each node. Therefore, there is a drawback that a large amount of time is required for computation. Additionally, in the conventional ASM, there is a problem that robustness is low. That is, in the case where the orientation of the face in the image which becomes the processing target differs largely from the orientation of the face which is expected in the face shape model arranged at the initial positions, there is a problem that failure of the feature point detection is frequently generated.
In order to solve the problem of ASM, there is a technique of AAM (Active Appearance Model) (see Non-patent Document 2: T. F. Cootes, G. J. Edwards and C. J. Taylor. “Active Appearance Models”, IEEE PAMI, Vol. 23, No. 6, pp. 681-685, 2001). In AAM, the position of the face feature point is determined as follows. As with ASM, the feature point position is obtained for the learning face image to produce the face shape model. Then, an average value of the node positions is obtained to produce an average shape model including the set of nodes of the average positions. Then, plural patches including the feature points are formed in the learning face image, each patch is projected to the average shape model to produce a shape-free image (the processing is referred to as “shape correction processing”). A face brightness value model (shape-free face brightness value model) is produced by performing the principal component analysis to the set of shape-free images. Then, a shape-free face brightness value vector is determined when the face shape model is finely displaced at constant intervals from the correct position toward each direction. Linear regression is computed for the sets. Therefore, the movement, the deformation direction, and the deformation amount to the correct point of the face shape model can be estimated from the finely-displaced face brightness value vector. In AAM, the above processing is previously performed as learning processing.
Processing for detecting the feature point position from the image with AAM will now be described. The face shape model is arranged at proper initial positions of the image which becomes the processing target. Then, the patches are produced based on the node positions of the arranged face shape model, and a brightness distribution is sampled in each patch. The sampled brightness distribution is projected to produce the shape-free face brightness value model.
Then, the movement and deformation amount of the face shape model are estimated from the shape-free face brightness value model using the previously determined regression equation. The face shape model is moved and deformed according to the estimation result. The processing is repeated a predetermined number of times or until a predetermined condition (convergence condition) is satisfied. Then, it is determined that the final position of each node is the position of each feature point.
Thus, according to AAM, each feature point position can be detected without performing the searching processing around each node. Therefore, unlike ASM, the time is not required for the searching processing around each node, and the computation time can be decreased. Furthermore, because the searching is performed by deforming the shape model, like the ASM, the feature point position can correctly be detected while the position relationship among the nodes maintains the face-like shape.
However, there is the following problem in AAM. In order to maintain the detection accuracy of each feature point position with AAM, it is necessary to perform homogeneous and high-density sampling in obtaining the brightness distribution of each patch. Therefore, the computation amount becomes excessively large in the sampling and in projecting the brightness distribution to the average shape model, and the large amount of computation time is requires. Additionally, in AAM, the movement amount and deformation amount of the face shape model are estimated based on the linear regression by the fine displacement about the correct point. Therefore, the estimation cannot correctly be performed for the large displacement, and the correct result cannot be obtained.
In the processing with the conventional ASM or AAM, the feature point cannot be detected at high speed, because a large amount of computation time is required in the searching processing or the shape correction processing of the brightness distribution with the homogeneous and high-density sampling. However, as described above, because the feature point detection result is utilized as the input data in the man-machine interface and the like, high response is frequently demanded in detecting the feature point. Therefore, needs of high-speed feature point detection is actually increased.
In view of the foregoing, an object of the invention is to provide an apparatus and a program which enables the feature point position to be detected from the face image at high speed.