In computer vision, one problem is to recover the three-dimensional shape of an object from a sequence of two-dimensional images acquired by a camera. This is especially difficult when both the camera parameters and point correspondences across the image sequence are unknown.
There is a large body of work on the recovery of raw 3-D data from multiple images; they include multibaseline stereo, trinocular stereo that combines constant brightness constraint with trilinear tensor, stereo with interpolation, and shape from rotation.
Virtually all stereo approaches assume a fixed disparity throughout once the disparity has been established, e.g., through a separate feature tracker or image registration technique. Most techniques assume that the camera parameters, intrinsic and extrinsic, are known. For 3-D facial modeling, the following techniques are generally known.
From Range Data:
Range acquisition equipment include light-stripe rangefinders, and laser rangefinders. Rangefinders, when compared to video cameras, are relatively expensive, and considerable post-processing is still required. For example in one method, feature-based matching for facial features, such as the nose, chin, ears, eyes, are applied to dense 3-D data to initialize an adaptable facial mesh. Subsequently, a dynamic model of facial tissue controlled by facial muscles is generated. In another method, a range image with a corresponding color image of a face is used. The 2-D color image is used to locate eyes, eyebrows, and mouth. Edges in color space are determined, and contour smoothing is achieved by dilation and shrinking.
From Two 2-D Images:
Two orthogonal views of a face are normally used. The profiles are extracted and analyzed; this is followed by facial feature extraction. A 3-D face template is then adjusted by interpolation, based on the extracted information.
From a Sequence of Temporally Related 2-D Images:
In one approach, 2-D images are used to reconstruct both shape and reflectance properties of surfaces from multiple images. The surface shape is initialized by conventional stereo image processing. An objective function uses the weighted sum of stereo, shading, and smoothness constraints. The combination of weights depends on local texture, favoring stereo for high texture with a known light source direction and known camera parameters.
A calibrated stereo pair of images has also been used. There, a disparity map is determined, followed by interpolation. In one implementation, three-dimensional deformation is guided by differential features that have high curvature values, for example, the nose, and eye orbits. If the motion between images in a sequence is small, then the optical flow can be used to move and deform a face model to track facial expressions. Fixed point correspondences are defined by the optical flow. The deformation of the face model is constrained and specific to faces. Facial anthropometric data are used to limit facial model deformations in initialization and during tracking with the camera's focal length approximately known.
In a different approach, facial features such as the eyes, nose and mouth are tracked using recursive Kalman filtering to estimate structure and motion. The filter output is used to deform the shape of the face subject to predefined constraints specified by a linear subspace of eigenvectors.