1. Technical Field
This invention is directed toward a system and process for facial expression mapping. More particularly, this invention is related to a system and process for mapping an expression on a face in one image to another face in another image.
2. Background Art
Facial expressions exhibit not only facial feature motions, but also subtle changes in illumination and appearance (e.g., facial creases and wrinkles). These details are important visual cues, but they are difficult to synthesize.
One class of methods to generate facial expressions with details is the morph-based approaches and their extensions [2, 14, 16,3]. The main limitation of these approaches is that they can only generate expressions in-between given expressions through interpolation. For example, given a person""s neutral, expressionless face, one is not be able to generate this person""s facial expressions using morph-based methods.
Another popular class of techniques, known as expression mapping (or performance-driven animation) [4, 10,20, 13], does not have such limitation. It can be used to animate 2D drawings and images, as well as textured or non-textured 3D face models. The concept behind this method is very simple. Given an image of a person""s neutral face and another image of the same person""s face with an expression, the positions of the face features (eyes, eye brows, mouths, etc.) in both images are located either manually or through some automatic method. The difference vector between the corresponding feature points in both images is calculated and is added to a new face""s feature positions to generate the new expression for that face through geometry-controlled image warping [21,2, 10]. One problem with such geometric-warping-based approach is that it only captures the face feature""s geometry changes, completely ignoring illumination changes. The resulting expressions do not have the expression details such as wrinkles. These details are actually very important visual cues, and without them, the expressions are less expressive and convincing.
There has been much other work done in the area of facial expression synthesis. For instance, the physically-based approach is an alternative to expression mapping. In one example of this physically-based approach, Badler and Platt [1] used a mass-and-spring model to simulate the skin and muscles. Extensions and improvements to this technique have also been reported [19, 18,9].
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, xe2x80x9creference [1]xe2x80x9d or simply xe2x80x9c[1]xe2x80x9d. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.
The present invention is directed toward a facial expression mapping system and process that overcomes the aforementioned limitations in prior systems and methods for mapping expressions from one person in an image to another. This system and method for mapping facial expressions uses the illumination change of one person""s expression by capturing it in what is called an Expression Ratio Image (ERI). Together with geometric warping, an ERI is mapped to any other person""s face image to generate more expressive facial expressions. An ERI can capture subtle but visually important details of facial expressions. The resulting facial expressions are significantly more expressive and convincing that the traditional expression mapping based on geometric warping.
More particularly, the present novel technique captures the illumination change of one person""s expression and maps it to any other person to generate more expressive facial expressions. The critical observation is that the illumination change resulting from the change of surface normal can be extracted in a skin-color independent manner by using what is called an expression ratio image (ERI). This ERI can then be applied to an image of any other person to generate correct illumination changes resulted from the geometric deformation of that person""s face.
In operation, a first image depicting a first face without expression, a second image depicting the face in the first image with expression whose details are to be transferred, and a third image depicting a face different from the first face and also without expression, are input. The faces in the first, second and third images are aligned, thereby creating a correspondence between pixels in the first, second and third aligned images. For each corresponding pixel in the first and second images, an Expression Ratio Image (ERI) is computed by dividing the intensity of the face with expression in the second aligned image by the face without expression in the first aligned image. A new image of a face with expression is created by multiplying the ERI by each corresponding pixel in the third image.
As stated above, the system and process according to the present invention aligns the faces in the first, second and third images. The process of performing this alignment in one embodiment of the invention requires finding the face features of the faces in the first, second and third images. A difference vector between the feature positions of the faces in the first and second images is then computed. The features of the face in the third image are moved along the difference vector and the face in the third image is warped accordingly. The faces in the first and second images are then aligned with the warped face in the third image through image warping.
Another embodiment of the ERI system and method computes a smoothed version of the Expression Ratio Image. This smoothing is useful in eliminating artifacts that occur due to image warping, especially those that occur in images expressed in red, green, blue (RGB) color space. In this process of computing a smoothed version of the ERI, aligned versions of the faces in the first and second images are input. For each corresponding pixel, the cross correlation of the first and second images is computed. The weight of each corresponding pixel is determined as 1 minus the cross correlation. An adaptive Gaussian filter is then run on the ERI. A threshold is used to determine whether a pixel has a large or small weight. A small window is used with the Gaussian filter for pixels with a large weight. Likewise, a large window is used with the Gaussian filter for pixels with a small weight. It is also possible to discretize the images using numerous thresholds and then applying a series of windows of different sizes with the Gaussian filter. In practice, however, it has been found that using two different window sizes yields good results.
The three images ideally should be taken under the same, or at least similar, lighting conditions. For images taken under completely different lighting conditions, one of several known relighting techniques may be used to compensate for the lighting difference between the images. For instance, histogram matching can be used to compensate for lighting differences between the images. Alternately, the color ratio between the images taken under different lighting conditions can be used to modify at least one of the images such that the lighting between the images is the same. Similarly, the color difference (instead of ratio) between image pairs can be used to modify one of the original images so that it matches the lighting conditions of the other.
One method of compensating for dissimilarities in lighting when the images do not exhibit similar lighting conditions, involves inputting aligned versions of the first image, second image and third image depicted in RGB color space. Component images are then calculated for each of the first, second and third images in YUV color space. For each corresponding pixel in the first and second images, a color correction ratio is calculating by dividing the y component of the first image by the y component of the second image. For each corresponding pixel in the third image a revised y component in YUV color space is calculated as the color correction ratio multiplied by the previously calculated y component of the third image. The component image of the third image in YUV color space is then converted to RGB color space using a standard conversion.
The mapping of expressions using ERIs can be extended to facial expression mapping for 3D textured face models. A 3D textured face model consists of a 3D geometric shape and a 2D texture image. In this case, ERIs are applied to the texture image. By both deforming the geometric shape and modifying the texture image accordingly, more expressive facial expressions are thus obtained for 3D facial animation. This technique also applies to shaded 3D face models. One can map expressions between synthetic face models, as well as map between real face expressions and synthetic face models.
The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing(s) will be provided by the U.S. Patent and Trademark Office upon request and payment of the necessary fee.