1. Field of Invention
The present invention relates to an active appearance model, AAM, machine and method. More specifically, it relates to an AAM with more robust and accurate image alignment capabilities.
2. Description of Related Art
In the field of computer vision, it is generally desirable that an image not only be captured, but that the computer be able to identify and label various features within the captured image. Basically, a goal of computer vision is for the computer to “understand” the contents of a captured image.
Various approaches to identifying features within a captured image are known in the industry. Many early approaches centered on the concept of identifying shapes. For example, if a goal was to identify a specific item, such as wrench or a type of wrench, then a library of the different types of acceptable wrenches (i.e. examples of “true” wrenches) would be created. The outline shapes of the true wrenches would be stored, and a search for the acceptable shapes would be conducted on a captured image. This approach of shape searching was successful when one had an exhaustive library of acceptable shapes, the library was not overly large, and the subject of the captured images did not deviate from the predefined true shapes.
For complex searches, however, this approach is not effective. The limitations of this approach become readily apparent when the subject being sought within an image is not static, but is prone to change. For example, a human face has definite characteristics, but does not have an easily definable number of shapes and/or appearance it may adopt. It is to be understood that the term appearance is herein used to refer to color and/or light differences across an object, as well as other surface/texture variances. The difficulties in understanding a human face becomes even more acute when one considers that it is prone to shape distortion and/or change in appearance within the normal course of human life due to changes in emotion, expression, speech, age, etc. It is self-apparent that compiling an exhaustive library of human faces and their many variations is a practical impossibility.
Recent developments in image recognition of objects that change their shape and appearance, such as the human face, are discussed in “Statistical Models of Appearance for Computer Vision”, by T. F. Cootes and C. J. Taylor (hereinafter Cootes et al.), Imaging Science and Biomedical Engineering, University of Manchester, Manchester M13 9PT, U.K. email: t.cootes@man.ac.uk, http://www.isbe.man.ac.uk, Mar. 8, 2004, which is hereby incorporated in its entirety by reference.
As Cootes et al explain, in order for a machine to be able to understand what it “sees”, it must make use of models that describe and label the expected structure being imaged. In the past, model-based vision has been applied successfully to images of man-made objects, but their use has proven more difficult in interpreting images of natural subjects, which tend to be complex and variable. The main problem is the variability of the subject being examined. To be useful, a model needs to be specific, that is, it should represent only true examples of the modeled subject. The model, however, also needs to be general and represent any plausible example (i.e. any possible true example) of the class of object it represents.
Recent developments have shown that this apparent contradiction can be handled by statistical models that can capture specific patterns of variability in shape and appearance. It has further been shown that these statistical models can be used directly in image interpretation.
To facilitate the application of statically models, subjects to be interpreted are typically separated into classes. This permits the statistical analysis to use prior knowledge of the characteristics of a particular class to facilitate its identification and labeling, and even to overcome confusion caused by structural complexity, noise, or missing data.
Additionally, in order to facilitate further processing of identified and labeled subjects within a captured image, it is beneficial for the identified subject to be transformed into (i.e. be fitted onto) a predefined, “model” shape with predefined locations for labeled items. For example, although the human face may take many shapes and sizes, it can be conformed to a standard shape and size. Once conformed to the standard shape and size, the transformed face can then be further processed to determine its expression, determine its gaze direction, identify the individual to whom the face belongs, etc.
A method that uses this type of alignment is the active shape model. With reference to FIG. 1, the active shape model uses a predefined model face 1A and a list of predefined deformation parameters, each having corresponding deformation constraints, to permit the model face to be stretched and move to attempt to align it with a subject image 2. Alternatively, the list of predefined deformation parameters may be applied to subject image 2, and have it be moved and deformed to attempt to align it with model face 1. This alternate approach has the added benefit that once subject image 2 has been aligned with model face 1, it will also be fitted to the shape and size of model face 1.
For illustrative purposes, FIG. 1 shows model face 1A being fitted to subject face 2. The example of FIG. 1 is an exaggerated case for illustration purposes. It is to be understood that a typical model face 1A would have constraints regarding its permissible deformation points relative to other points within itself. For example, if aligning the model face meant moving its left eye up one inch and moving its right eye down one inch, then the resultant aligned image would likely not be a human face, and thus such a deformation would typically not be permissible.
In the example of FIG. 1, the model face is first placed roughly within the proximity of predefined points of interest, and typically placed near the center subject face 2, as illustrated in image 3. By comparing the amount of misalignment resulting from moving model face 1A in one direction or another, and the results of adjusting a size multiplier in any of several predefined directions, one can determine how to better align model face 1, as illustrated in image 4. An objective would be to align as closely as possible predefined landmarks, such as the pupils, nostril, mouth corners, etc., as illustrated in image 5. Eventually, after a sufficient number of such landmark points have been aligned, the subject image 2 is warped onto model image 1A resulting in a fitted image 6 with easily identifiable and labeled points of interest that can be further processed to achieve specific objectives.
This approach, however, does not take into account changes in appearance, i.e. shadow, color, or texture variations for example. A more holistic, or global, approach that jointly considers the object's shape and appearance is the Active Appearance Model (AAM). Although Cootes et al. appear to focus primarily on the gray-level (or shade) feature of appearance, they do describe a basic principle that AAM searches for the best alignment of a model face (including both model shape parameters and model appearance parameters) onto a subject face while simultaneously minimizing misalignments in shape and appearance. In other words, AAM applies knowledge of the expected shapes of structures, their spatial relationships, and their gray-level appearance (or more generally color value appearance, such as RGB values) to restrict an automated system to plausible interpretations. Ideally, AAM is able to generate realistic images of sought objects. An example would be a model face capable of generating convincing images of any individual, changing their expression and so on. AAM thus formulates interpretation as a matching problem: given an image to interpret, structures are located and labeled by adjusting the model's parameters in such a way that it generates an ‘imagined image’ that is as similar as possible to the real thing.
Although AAM is a useful approach, implementation of AAM still poses several challenges. For example, as long as the AAM machines manages to find a “fit” within its defined parameters, it will assume that a plausible match, or fit, has been found, but there is no guarantee that the closes match within its defined parameters is in fact a true example.
In other words, even if an AAM machine appears to have aligned a subject input image with a model image, the resulting aligned image may not necessarily be a true representation of the subject category. For example, if the initial position of the model image is too far misaligned from the subject input image, the model image may be aligned incorrectly on the subject input image. This would result in a distorted, untrue, representation of the warped output image.
Other limitations of an AAM machine results from the application of statistical analysis of a library of true samples to define distinguishing parameters and the parameter's permissible distortions. By the nature of the statistical analysis, the results will permit alignment only with a fraction of the true samples. If the subject category is prone to a wide range of changes, the model may not be able to properly align itself to an input subject image with characteristics beyond the norm defined by the shape or appearance model. This is true of even sample images within the library from which the model image (i.e. the shape or appearance model) is constructed. Typically, the constructed model image will be capable of being aligned to only 90% to 95% of the true sample images within the library.