1. Field of the Related Art
The present invention relates to a technique that is effective when applied to a device, a method or a program, etc., for detecting feature points of an object from an image.
2. Description of the Related Art
A technique for detecting the feature points of the object from the image includes a technique of detecting the feature points of a face. The related art for when the object is a face will be described below.
The closing and opening information of the eye, face orientation, viewing direction, facial expression, etc. of a person can be estimated by obtaining the positions of the feature points of the face from the image in which the person is imaged. In addition, the state of the person can be understood in more detail from the estimation result. Further, the performance of a man-machine interface may be enhanced and a new service may be provided by understanding the state of the person in detail. Therefore, it is important to accurately learn the positions of the feature points of the face in developing the man-machine interface.
A driver monitoring system is an example of the man-machine interface. In this system, the opening and closing of the eye, the face orientation, the viewing direction, etc. of the driver are observed. Then, appropriate advice is given based on the degree of fatigue, etc. of the driver determined from the observation result.
Furthermore, application to moving picture video camera, still image camera etc. are other examples of the man-machine interface. In these devices, various processes such as changing the photographing conditions become possible by learning the face orientation of the person to be photographed. Moreover, an individual can be identified with high precision by analyzing in detail the image of the feature points such as the eye, the mouth, etc.
Generally, a method of obtaining the positions of the feature points from the image (hereinafter referred to as “feature point detecting method”) is a method that employs template matching, and the application thereof. In such method, the general information (hereinafter referred to as “feature value”) of each feature point is stored in advance. Then, the feature value obtained from one region of the image and the stored feature value are compared and a determination is made as to whether or not the feature point obtained is the feature point to be acquired. A specific example of the feature value used in this case is a luminance value vector of the image. In addition, means for comparing the feature values generally include using normal correlation or Euclidean distance.
A technique for reducing the number of search points in advance by means of a separation degree filter, and thereafter detecting the feature point candidates through pattern matching has been proposed (refer to Japanese Laid-Open Patent Publication No. 9-251534) as an example of feature point detecting technique. In this technique, geometrical restraining conditions are applied after the feature point candidates are detected and the combination of candidate points determined most likely to look like a human face is output.
In this technique, however, only the feature points of the image of a person, who is facing more or less to the front, under satisfactory photographing conditions can be detected. Therefore, the feature points are difficult to accurately detect from an image in which some of the feature points are hidden by an object or from an image in which the photographing conditions (for example, the lighting conditions) greatly differ from the time of acquiring the feature points that were stored in advance.
Techniques for estimating the face orientation by obtaining the positions of the feature points of the face have also been proposed (refer to Japanese Laid-Open Patent Publication No. 2000-97676 and Japanese Laid-Open Patent Publication No. 2003-141551). In these techniques, the method of acquiring the positions of the feature points of the face, and thereafter making an estimation using the entire arrangement and the feature value thereof is generally known. For example, the relationship between the coordinates of the feature points indicating the eye, the mouth, the eyebrow, and the nose, and the face orientation is stored in advance as a look-up table. Then, the face orientation corresponding to the coordinates of the feature points acquired from the image is determined from the look-up table, and output as an estimation result. Other methods of obtaining the face orientation include a method of preparing a template of the entire face or of the feature values of the face in correspondence to a plurality of directions of the face, and performing a match with the template. In such methods as well, however, whether or not the estimation of the face orientation can be accurately performed depends on the accuracy of the positions of the feature points of the face. Therefore, the estimation of the face orientation cannot be accurately performed unless the positions of the feature points of the face are accurately acquired.
A technique referred to as ASM (Active Shape Model) (refer to, for example, A. Lanitis, C. J. Taylor, T. F. Cootes, “Automatic Interpretation and Coding of Face Images Using Flexible Models. IEEE PAMI Vol. 19, No. 7 pp. 743-756, July, 1997) is known as a technique for solving the above problems. In ASM, the positions of the feature points are acquired in advance for a great number of training face images, and the face shape model is created and stored. The face shape model is configured by nodes corresponding to each feature point. The details of the face shape model will be described later.
The process of detecting the position of the feature point from the image by ASM will now be described. First, the face shape model is arranged at an appropriate initial position of the image to be processed. Next, a plurality of feature values around the node is acquired for each node of the face shape model. The acquired plurality of feature values and the feature values associated with the relevant node in advance are then compared. Each node is moved to a position where the feature value closest to the feature value corresponding to each node is acquired (i.e., the position at which the possibility of being the feature point corresponding to each node is the highest) out of the positions at which the plurality of feature values are acquired. At this point, the position of each node of the face shape model is displaced from the initial position. Consequently, the deformed node set is shaped by projecting it onto the face shape model. The processes after acquiring the feature value around each node are repeatedly performed a predetermined number of times or until a constant condition (restraining condition) is met. The final position of each node is then determined as the position of each feature point.
In ASM, projection onto the face shape model is performed after the position of each node is moved. According to this process, the accurate position of the feature point can be detected with the positional relationship of each node maintaining a face-like shape. In other words, even if by chance a portion having a feature value similar to a feature point exists at a position not acceptable in forming the shape of the face of a normal person, such a point is prevented from being mistakenly detected as the feature point.
However, a search (acquisition of feature value and comparison thereof of the peripheral region must be performed at each node in conventional ASM. Thus, a large amount of calculation time is required. Moreover, conventional ASM has a drawback in that the robustness is low. In other words, if the direction of the face in the image to be processed differs greatly from the direction of the face assumed in the face shape model arranged at the initial position, the detection of the feature points tends to fail.
AAM (Active Appearance Model) is a technique proposed for solving the problems of ASM (refer to T. F. Cootes, G. J. Edwards and C. J. Taylor. “Active Appearance Models”, IEEE PAMI, Vol. 23, No. 6, pp. 681-685, 2001). In AMM, the position of a feature point is obtained in the following manner. First, the positions of the feature points are acquired for a large number of training face images, and a face shape model is created, similar to ASM. Next, a mean value for all relevant feature points among the large number training face images is determined for each feature point in the face shape model. Subsequently, an average shape model is constructed from the set of feature points that are closest to the previously calculated mean values. Feature points are taken from the learning face images to create a plurality of patches; each patch is projected onto the mean shape model, and a shape free image is created (this process is referred to as the “shape correcting process”). In order to remove changes to the node position influenced by facial expression, facial direction, and various idiosyncrasies of an individual's face, etc, found in an image, the node position is fitted onto the mean shape model thus resulting in the shape free image in which only luminance value information remains. A patch is a plane formed from a plurality of nodes or feature points at its vertices. A face luminance value model (shape free face luminance value model) is created by performing main component analysis on a set of shape free images. Subsequently, starting from the correct position, the face shape model is minutely displace by a constant amount in each to obtain a shape free luminance value vector. Linear regression is performed on the relevant set of vectors. Thus, it is possible to estimate the amount and direction needed to move and/or deform a point in the minutely displaced face luminance value vector to obtain a correct point in the face shape model. The above-described processes are executed in advance as learning processes in AMM.
The process of detecting the position of the feature point from the image using AMM will now be described. First, the face shape model is arranged at an appropriate initial position of the image to be processed. Next, a patch is then created based on the node position of the arranged face shape model, and the luminance distribution in each patch is sampled. Then, the shape free face luminance value model is created by projecting the sampled luminance distribution onto the image.
The amount of movement and deformity of the face shape model is estimated from the shape free face luminance value model by a regression expression obtained in advance. The face shape model is moved and deformed according to the estimation result. The above processes are repeatedly executed a predetermined number of times or until a constant condition (restraining condition) is met. The final position of each node is then determined to be position of each feature point.
According to AAM, the position of each feature point is detected without performing a search of the peripheral region of each node. Thus, time is not required for a search of the peripheral region of each node, as opposed to ASM, and the calculation time can be reduced. Moreover, the accurate position of the feature point can be detected with the positional relationship of each node maintaining a face-like shape, similar to ASM, since the search is performed by deforming the shape model.