1. Field of the Invention
The present invention generally relates to 3-dimensional (3D) scanning of the human face and more particularly relates to a method and apparatus for generating a fully textured 3D model of a human face using a single camera.
2. Description of the Related Art
Many applications require the use of a 3D face model. The generation of a fully textured 3-D model of a person's face presents difficult technical challenges, but has many applications in several fields, such as video games, immersive telepresence, and medicine. For instance, players in an interactive game may want to see their own face on the body of their hero. Facial animation and e-mail delivery by an avatar is an interesting use of a 3D face model (see Reference 1 or Reference 2). Another example is demonstrated in the “Virtual Try-On” of eyeglasses in 3-D (See Reference 3).
One of the critical components for these applications is the 3-D model acquisition phase. Active methods proceed by projecting laser (See Reference 4), infrared, or other patterns on the face to produce very good results, but the hardware required reduces their operational flexibility. The space carving methodology (Reference 5) has emerged from the use of regular cameras in the past. It seems appropriate, but it requires many sensors.
Methods using two cameras only (Reference 6) have recently become popular and were seen at various trade shows, such as Siggraph 2000. In addition, Pascal Fua (Reference 7) has built a system to reconstruct faces from video sequences, with an un-calibrated camera. The approach is based on a regularized bundle adjustment, and makes extensive use of a generic 3-D face model. This enables the recovery of the motion information. The final model is built by deforming the generic model. Zhengyou Zhang (see Reference 8) has also demonstrated a system to build a three dimensional model, using a single web cam to capture images. The 3-D model has further been integrated with a number of other elements, such as a text to speech animation module, to produce a complete animation-ready head. Zhang extracts 3-D information from one stereo pair only, then deforms a generic face model. Camera poses are computed for the rest of the sequence, and used to generate a cylindrical texture.
The major drawbacks of such systems are the cost of the hardware they require, or the lack of operational flexibility, or a generic look-a-like of the reconstructed models when computed by deforming a model. In addition, these methods may fail to reconstruct artifacts like beards or moustaches. There is therefore a need for generic approaches to generate a fully textured 3D face model using a single camera.