This disclosure relates generally to the detection of landmark points in images. More particularly, the disclosure relates to techniques to generate and fit an object model to a sample image that improves the accuracy and speed with which landmark points may be detected in the sample image.
Accurately locating landmark points in images is an important step in a number of image processing tasks such as face detection, face recognition, person or other object recognition, medical image analysis, image tagging, photo effects, photo adjustments, photo auto improvements, slideshows, image cropping, and other similar tasks. Typical objects of interest for which landmark detection may be desirable include faces, pets (e.g., dogs, cats, etc. and their faces), people, vehicles, etc. Landmark detection algorithms may employ a model generated from an offline set of training images depicting the particular object of interest. Each image in the set of training images may be annotated with the location of predetermined landmark points. For example, a model for a landmark detection algorithm to locate faces may be constructed from a set of images that are annotated to identify the location of eyes, eyebrows, nose, lips, and other recognizable points in each image.
The model generated offline may include a shape model and a data attachment term, each generated from annotated training data. The shape model may be generated from the position of landmark points in the training images. The purpose of the shape model is to model the position and displacement of the landmark points and to act as a regularization constraint to ensure that a “valid shape” is maintained when the model is fit to a sample image. In one approach, the shape model is generated using Principal Component Analysis of the concatenation of the coordinates of the landmark points in the training images to define a mean shape and a set of displacement modes. Referring to FIG. 1, the coordinates for multiple landmark points for individual training images 105 in training set 110 may be determined. The landmark points from the individual images 105 may then be merged to generate mean shape 115 (illustrated with lines connecting the mean landmark points for purposes of clarity), which may be expressed as multi-dimensional vector 120 that includes the two-dimensional coordinates of the vertices of mean shape 115.
The data attachment term is computed based on image pixel values around each landmark in the training images. The data attachment term varies according to the particular landmark detection algorithm that is employed, but, regardless of the exact form, its purpose is to drive an initial position of landmark points towards the correct position of the corresponding landmark point in a sample image (e.g., based on the mean shape). The Active Appearance Model adapts a shape model to fit a sample image by iteratively minimizing the distance between the sample image texture inside a mesh generated from the current location of the landmark points in the sample image and the average texture of the objects in the training set under the constraint of maintaining a valid shape. The Active Shape Model adapts each landmark point individually to fit a sample image, usually with an edge map, and then iteratively projects the adapted landmark points back on the shape model. Recently, Constrained Local Models have proven to be quite accurate and robust. The data attachment term for a Constrained Local Model is used to generate a confidence map from image data in a region surrounding each landmark point in a sample image. The landmark points are jointly adapted in a manner that maximizes the response of the landmark points' respective confidence maps.
Although Constrained Local Models have proven to be quite accurate, there is still room to improve the accuracy and speed with which landmark points are identified in a sample image. It would therefore be desirable to identify changes in the training and fitting operations associated with the location of landmark points in images using Constrained Local Models to improve accuracy and performance.