Conventional face reconstruction techniques often use a two dimensional image or images (e.g. digital photographs) of a face to create a three dimensional representation of the face. The representation that is created may be a file, such as an electronic file, indicative of individual characteristics of different faces. The file can then be used, e.g., for facial recognition, animation, or rendering.
The images, once obtained, are often processed based on prior knowledge or assumptions of what faces usually look like. This knowledge is often called “domain knowledge”, a “prior model”, or more specifically a “generic face”. For example, the prior face knowledge may indicate the presence or likely locations of different kinds of facial features, such as eyes, nose, etc. The prior face knowledge may assume that the face is formed of a linear combination of basis face shapes and appearances, camera parameters, lighting parameters, and other known elements, or elements that are susceptible of estimation. These elements can be combined to estimate the likely appearance of a face. More specifically, the domain knowledge may come in the form of a generic face shape defined by an artist or an average face shape computed from a plurality of known face shapes.
One common technique for face reconstruction uses prior face knowledge of a generic face, and possibly a set of face metrics or deformation parameters, throughout the reconstruction process. Another common technique attempts to eschew the use of prior face knowledge and instead uses a purely data-driven approach to reconstruct the face. This can be done, for example, using triangulation of two-dimensional points in multiple images from multiple calibrated cameras. Unfortunately, the former approach may provide unrealistic data, due to the use of the generic face throughout the process. The latter approach requires additional hardware infrastructure which is difficult to practically implement at a reasonable cost. A single-camera purely data-driven approach alleviates some of the hardware constraints of multi-view stereo methods, but may itself be unstable due to the lack of constraints at stages of the process.