The ability to detect the locations of facial features is useful for a variety of applications. These applications include automatic facial morphing and warping, expression recognition, hair segmentation, face recognition and classification, red-eye detection, and facial image compression. Many of the techniques that are used to locate the positions of facial features are also useful for a variety of other general image feature detection tasks. These can include identifying organs in medical imagery and locating circuit board components in industrial vision applications.
Facial feature finding has been studied by a number of researchers. There are mainly four categories of facial feature finding algorithms. They are template matching, edge detection, shape models, and holistic matching. Techniques that use shape models seem to be the most promising. These methods use a model of the feature shape to constrain the search to plausible results. This increases both the accuracy of the feature finder and the range over which the features can be uniquely identified. Deformable templates and active shape models are the two most popular approaches. Deformable templates need an explicitly parameterized model of the feature shape. This limits the applicability of the technique to shapes that are easily parameterized and reduces the accuracy of the results for shapes that do not strictly conform to the parameters of the shape model. Active shape models on the other hand, learn a model of the feature shape based on a series of ground truth examples. This enables the method to be applicable to a much broader class of feature shapes.
The active shape model technique was developed by Cootes et al. (see Cootes, T. F., Taylor, C. J., Cooper, D. H., “Active Shape Models—Their Training and Application,” Computer Vision and Image Understanding, Vol. 61, No. 1, pp. 38–59, 1995). It provides a model-based mechanism for locating objects in images. A flexible approach to modeling is used that is applicable to a broad class of target objects. The procedure consists of both a training and a searching stage. During training a set of example images are manually annotated with a series of control points that indicate the ground truth feature positions. These feature locations are analyzed to develop a model of the shape of the plausible relative positions of the control points. Models of the texture around each control point are also created. These models are generated once and stored for use in subsequent searches. During searching, a series of local searches are performed at each feature point to find the location that best matches the texture model for that feature. The global shape model is then used to constrain the results of the local searches. This process iterates until it converges upon a stable result.
In Cootes' system, the searching operation requires an approximate starting location that has to be provided by a user. This user intervention could be replaced by an automatic process of finding certain features, preferably, two eyes, with a simple, fast method.
Methods are known in the art for detecting human eyes in a digital image. For example, U.S. Pat. No. 6,072,892 discloses the use of a thresholding method to detect the position of human eyes in a digital image. In this method, a scanning window scans across the entire image using a raster scanning method. A histogram extractor extracts an intensity histogram from the window as it scans across the image. Each intensity histogram is examined by a peak detector to find three peaks in the histogram representing the skin, the white of the eye, and the black of the pupil. A histogram having the three peaks identifies a location in an image that potentially defines an eye position. Eye position is determined from among the potential locations by calculating the area under the histogram associated with each potential location and by selecting the location that is associated with the histogram with the largest area.
One of the problems with this approach is that the entire image must be scanned on a pixel-by-pixel basis. Thus, a search window must be positioned at each pixel in the image and a histogram must be assembled at each pixel location. Further, the area under each histogram must be calculated and stored. It will be appreciated that this method consumes enormous amounts of computing power and reduces the rate at which images can be processed. This method can also produce a high rate of false positives.
Methods are also known to detect human eyes that have abnormally high red content. Such abnormally high red content is commonly associated with a photographic phenomenon known as red eye. Red eye is typically caused by a flash of light that is reflected by a pupil. As is described in commonly assigned U.S. Pat. No. 6,292,574, it is known to search in images for pixels having the high red content that is indicative of red eye. Similarly, commonly assigned U.S. Pat. No. 5,432,863 describes a user interactive method for detecting pixels in an image that have color characteristic of red eye. It will be recognized that these methods detect eyes only where red eye is present.
Note that in Cootes' system, the search process uses a shape model coefficient constraining method that does not select a most similar shape within the ground truth shape space. Also, Cootes' system uses constant scale texture model search windows that restrict the accuracy of the final results that the system can reach. Cootes' system assumes that the scale of the objects are fixed. This requires that images that portray objects of different sizes be scaled in a pre-processing step. This scale factor could be based on an initial estimate of the object's size. The assumption of a fixed scale has the potential to improve the performance by enabling the image to be scaled once during a pre-processing step rather than repeatedly scaling the texture windows when searching. However, utilizing a fixed scale limits the adaptability of the algorithm and adversely affects the accuracy when the initial estimate of the scale is incorrect.
Therefore, there is a need for constructing a system having an ability of automatically determining a starting point search location with no user intervention by using an eye detection mechanism. There is also a need for the system to have abilities of selecting a best shape model among the ground truth shape space and to vary the scale of the texture model and search windows.