One of the most interesting and difficult problems in computer graphics is the effortless generation of realistic looking, animated human face models. Animated face models are essential to computer games, film making, online chat, virtual presence, video conferencing, etc. So far, the most popular commercially available tools have utilized laser scanners. Not only are these scanners expensive, the data are usually quite noisy, requiring hand touchup and manual registration prior to animating the model. Because inexpensive computers and cameras are widely available, there is a great interest in producing face models directly from images. In spite of progress toward this goal, the available techniques are either manually intensive or computationally expensive.
Facial modeling and animation has been a computer graphics research topic for over 25 years [6, 16, 17, 18, 19, 20, 21, 22, 23, 27, 30, 31, 33]. The reader is referred to Parke and Waters' book [23] for a complete overview.
Lee et al. [17, 18] developed techniques to clean up and register data generated from laser scanners. The obtained model is then animated using a physically based approach.
DeCarlo et al. [5] proposed a method to generate face models based on face measurements randomly generated according to anthropometric statistics. They showed that they were able to generate a variety of face geometries using these face measurements as constraints.
A number of researchers have proposed to create face models from two views [1, 13, 4]. They all require two cameras which must be carefully set up so that their directions are orthogonal. Zheng [37] developed a system to construct geometrical object models from image contours, but it requires a turn-table setup.
Pighin et al. [26] developed a system to allow a user to manually specify correspondences across multiple images, and use vision techniques to computer 3D reconstructions. A 3D mesh model is then fit to the reconstructed 3D points. They were able to generate highly realistic face models, but with a manually intensive procedure.
Blanz and Vetter [3] demonstrated that linear classes of face geometries and images are very powerful in generating convincing 3D human face models from images. Blanz and Vetter used a large image database to cover every skin type.
Kang et al. [14] also use linear spaces of geometrical models to construct 3D face models from multiple images. But their approach requires manually aligning the generic mesh to one of the images, which is in general a tedious task for an average user.
Fua et al. [8] deform a generic face model to fit dense stereo data, but their face model contains a lot more parameters to estimate because basically all of the vertexes are independent parameters, plus reliable dense stereo data are in general difficult to obtain with a single camera. Their method usually takes 30 minutes to an hour, while ours takes 2-3 minutes.
Guenter et al. [9] developed a facial animation capturing system to capture both the 3D geometry and texture image of each frame and reproduce high quality facial animations. The problem they solved is different from what is addressed here in that they assumed the person's 3D model was available and the goal was to track the subsequent facial deformations.