The appearance and expressiveness of facial performances are greatly influenced by complex deformations of the face at several scales. Large-scale deformations are driven by muscles and determine the overall shape of the face. Medium-scale deformations are mainly caused by skin wrinkling, and produce many of the expressive qualities in facial expressions. Finally, at the skin mesostructure there is fine-scale stretching and compression which produces subtle but perceptually significant cues. This complex behavior is challenging to reproduce in virtual characters with any combination of artistry and simulation.
Currently, creating realistic virtual faces often involves capturing textures, geometry, and facial motion of real people. It has proven, however, to be difficult to capture and represent facial dynamics accurately at all scales. Face scanning systems can acquire high-resolution facial textures and geometry, but typically only for static poses. Motion capture techniques record continuous facial motion, but only at a coarse level of detail. Straightforward techniques of driving high-resolution character models by relatively coarse motion capture data often fail to produce realistic motion at medium and fine scales. This limitation has motivated techniques such as wrinkle maps, blend shapes, and real-time 3D scanning. However, these prior art methods either fail to reproduce the non-linear nature of skin deformation, are labor-intensive, or do not capture and represent all scales of skin deformation faithfully.
Several prior art real-time 3D scanning systems/methods exist that are able to capture dynamic facial performances. These systems/methods either rely on structured light, use photometric stereo, or a combination of both. These prior art systems/methods are not suited for acquiring data for facial deformation syntheses, either because they do not attain the necessary acquisition rate to capture the temporal deformations faithfully, or they are too data-intensive, or they do not provide sufficient resolution to model facial details.
Modeling and capturing fine wrinkle details is a challenging problem for which a number of specialized prior art acquisition and modeling techniques have been developed. For instance, while some prior art techniques have modeled static pore detail using texture synthesis these techniques can be suitable to enhance static geometry, but they do not model wrinkle or pore deformations over time. Some other prior art techniques have demonstrated how linear interpolation of artist-modeled wrinkle maps can be used for real-time rendering. These techniques, however, model wrinkle and pore detail either statistically or artistically, making the creation of an exact replica of a subject's skin detail difficult.
A different prior art approach has been to model skin detail by measuring it from live subjects. Some prior art techniques have relied on normal maps to model skin meso-structure, captured using photometric stereo from few static expressions. Dynamic normal variation in skin meso-structure for intermediate facial poses can obtained using trilinear interpolation. Certain prior art techniques record dynamic facial wrinkle behavior from motion capture and video of an actor. A pattern of colored makeup is employed to improve shape-from-shading to detect wrinkle indentations in these regions. A non-linear thin shell model can be used to recreate the buckling of skin surrounding each wrinkle. While these systems estimate realistic facial geometry, they are mostly limited to larger scale wrinkles, and rely on (a form of) linear data interpolation to generate intermediate expressions.
Performance capture techniques use the recorded motion of an actor to drive a performance of a virtual character, most often from a set of tracked motion capture markers attached to the actor's face. Mapping the set of tracked markers to character animation controls is a complex but well-studied problem. Prior art techniques have introduced linear expression blending models. Blend shapes have become an established method for animating geometric deformation, and can be either defined by an artist or estimated automatically. Several techniques have used blend shapes to simulate detailed facial performances by linearly interpolating between a set of images or geometric exemplars with different facial expressions. A drawback of this approach is that it can be difficult to use linear blend shapes to reproduce the highly non-linear nature of skin deformation. Skin tends to stretch smoothly up to a point and then buckle nonlinearly into wrinkles. Furthermore, relating blend shapes to motion capture data is a non-trivial task.
Physically based simulation models use underlying bio-mechanical behavior of the human face to create realistic facial animations. Certain prior art techniques have determined individual muscle activations from sparse motion capture data using an anatomical model of the actor. Synthesizing detailed animations from such performance capture data would require very detailed models of facial structure and musculature, which are difficult to accurately reconstruct for a specific performer.
Thus, while prior art techniques may be suitable for certain situations and applications, they have exhibited limitations for creating realistic virtual faces, including for capturing textures, geometry, and facial motion of real people. What is needed therefore are new techniques that more accurately model and reproduce natural looking facial movements.