1. Field of the Invention
The present invention relates to a characteristic point detection method and apparatus for detecting characteristic points of a predetermined target object included in an image, and the program therefor.
2. Description of the Related Art
Methods for detecting the positions, postures, shapes, and the like of predetermined target objects included in images have been studied in various fields, and different methods are proposed as one of such methodologies. In these methods, a plurality of characteristic points representing a characteristic region of a predetermined target object is defined in advance. Then, the plurality of characteristic points is detected from a detection target image, and the position, posture, shape and the like of the predetermined target object are detected based on the positional relationship of the detected characteristic points.
For example, a target object position detection method is proposed as described, for example, in Japanese Unexamined Patent Publication No. 6 (1994)-348851. In the method, a plurality of reference characteristic points forming a target object (e.g., eyes, nose, face contour, or the like when the target object is a human face) is defined. Then, response when a specific filter is applied to the characteristic points is learned. In addition, a standard positional relationship between the characteristic points, i.e., an existence probability distribution of the center point of the target object with respect to each of the characteristic points on an image is also learned, and these learning results are stored. Thereafter, the same filter used for learning is applied to an input image to detect a plurality of candidates of the characteristic points from the response, and a comparison is made between each of the detected characteristic points and the learned standard positional relationship between the characteristic points. Then, the existence probability distributions of the center point of the face on the image are added up, and the position having a highest existence probability is determined as the center point of the target object. Here, the probability distribution is approximated by a Gaussian function.
Further, an object position detection method, which is similar to the method described in Japanese Unexamined Patent Publication No. 6 (1994)-348851, is proposed in a non-patent literature, “A Multi-Stage Approach to Facial Feature Detection”, D. Cristinacce et al., In Proc. of BMVC, 2004, pp. 231-240. The method detects a plurality of characteristic points as a pair, instead of detecting a certain single point, such as the center point of an object or the like. In addition, it statistically generates existence probability distributions from multitudes of learning samples in order to determine the “standard positional relationship between characteristic points” according to real data. More specific description of the method will be provided hereinafter.
(Learning Step)
In the method, an existence probability distribution of the correct point of one characteristic point with respect to the position of another characteristic point detected by a certain characteristic point detector (having a discriminator generated by AdaBoost learning algorithm, or the like) is provided for each pair of two different characteristic points. The positional relationship between the characteristic points is represented by the existence probability distributions of a plurality of these pairs. Here, the existence probability distribution of the correct point of a characteristic point Xi (coordinate xi) with respect to a detector output coordinate xj of a characteristic point Xj is defined as Pij (xi|xj). Note that Pij is represented by a two-dimensional histogram in actual implementation.
In order to obtain the existence probability distribution Pij, first, target object detection is performed on a training image set (several thousands of images with correct coordinates of characteristic points of a target object inputted therein), and the images are normalized so that the target objects locate at a reference position. FIG. 3 illustrates an example case where the target object is a human face, and faces are detected from images, then the images are normalized such that the faces locate in the center thereof with a predetermined size.
Then, a characteristic point Xi is detected by a characteristic point detector Di from the normalized images, and the difference between the coordinate xi of the characteristic point Xi and the correct coordinate xj of another characteristic point Xj is compared for each pair of two different characteristic points, the characteristic point Xi and the another characteristic point Xj, and the results are added up. FIGS. 4A to 4C illustrate examples of existence probability distributions Pij obtained through the learning described above. FIGS. 4A to 4C are examples where the target objects are human faces, in which the positions of the characteristic points detected by the characteristic point detectors are denoted by “x”, and the existence probability distributions of target characteristic points are represented by shading on the images. Here, the position having a higher existence probability is indicated by denser shading. FIG. 4A illustrates an existence probability distribution of the point of the outer corner of the left eye with respect to the position of the point of the inner corner of the left eye detected by the left eye inner corner detector. FIG. 4B illustrates an existence probability distribution of the point of the left nostril with respect to the position of the inner corner of the left eye. FIG. 4C illustrates an existence probability distribution of the point of the right mouth corner with respect to the position of the inner corner of the left eye.
(Detection Step)
A target object is detected from a detection target image, and a normalized image that includes the target object is obtained. Then, processing is performed on the normalized image for detecting candidates of characteristic points, and sum of the existence probability distributions of each characteristic point that may be estimated from the candidate of another characteristic point is obtained, thereby the point having a highest existence probability is estimated and selected as the true point of the characteristic point. The point that may be estimated as the true point of a characteristic point may be expressed by the following formula.
            x      ⋒        i    =            arg        ⁢    max    ⁢                  ∑                  j          =          1                n            ⁢                        ∑                      t            =            1                    k                ⁢                              P            ij                    ⁡                      (                                          x                i                            ⁢                              ❘                            ⁢                              q                jt                                      )                              
Where, x^i (left-hand side) is the position coordinate of the point that may be estimated as the true point of the characteristic point; Pij(xi|qjt) is the existence probability distribution of the characteristic point Xi (position coordinate xi) with respect to the coordinate qjt of the tth candidate of the characteristic point Xj; “k” is the number of candidates of the characteristic point Xj; and “n” is the number of defined characteristic points.
When detecting the position of a certain characteristic point, the method described above does not rely on the output of the single characteristic point detector assigned to detect the characteristic point. Instead, each of a plurality of characteristic point detectors estimates the positions of other characteristic points with each other, and thereby the method may provide more excellent detection capabilities than the capabilities of a single detector.
In the object position detection method proposed in the non-patent literature, however, the departure from the average position becomes greater for a characteristic point located farther from the position of the reference characteristic point if a relatively large change occurs in the posture of the target object, and the reliability of the existence probability distribution thereof is degraded. FIGS. 14A to 14C illustrate that the departure from the average position becomes greater for a characteristic point located farther from the position of the reference characteristic point, taking a face as an example target object. FIGS. 14A to 14C illustrate the average position of each characteristic point and the position of each component of actual face superimposed thereon with the position of the characteristic point of the left eye fixed as the reference when the orientations of the faces are front, left, and downward respectively. These drawings illustrate that the farther the location of other characteristic points from the position of the characteristic point of the left eye, the greater the amounts of departure from the average positions of the other characteristic points.
Further, if the density of characteristic points is biased, like whereas a certain component of a target object has three characteristic points, other components have only one characteristic point, the influence from the characteristic point locating at a specific fixed location becomes great, thereby the balance of the positions of characteristic points contributing to the integration of existence probability distributions of a certain characteristic point is disturbed. As a result, a candidate, which is actually not the characteristic point, is more likely to be incorrectly selected as the true point of the characteristic point. FIG. 15, taking a face as an example target object, illustrates that many characteristic points are concentrated on a specific face component. As illustrated in the drawing, the eyes have more number of characteristic points than the other face components, so that the influence from the characteristic points locating in specific fixed positions, the “eye positions” becomes great. Thereby, strong binding force is exerted centered on the eye positions, and the binding force from the other positions is weakened.