Reconstructing the 3D shape of a human face (or other object) from a photo is an ill-posed problem and therefore requires prior knowledge, which is usually introduced in the form of a statistical model. Statistical shape models represent object classes by parameters describing the variability of the elements within the class. In contrast to models that are purely descriptive, statistical models are generative, i.e. the shape is a function of the parameters.
As but a few non-limiting examples (all examples herein—above or below—being of the non-limiting type), reconstructed 3D surfaces can be used for computer animation of faces and whole bodies, e.g., for movies or computer games or to customize 3D avatars. Such avatars are used by computer users as an alter ego in computer games or online communities. The shape can also be used indirectly to manipulate the expression or attributes of faces in photographs, e.g., by transferring the expression from one photograph to another, to exchange faces in photographs, to generate stimuli for psychological experiments, and/or for automatic face recognition.
Currently there exist two types of face recognition systems. Firstly, there are 3D face recognition systems which use a 3D scan. These systems are typically used for access control. The drawback of these systems, however, is that they require a 3D scan and can therefore only be used in a cooperative scenario, e.g., in which a person's face is being voluntarily scanned. Secondly, there are known 2D face recognition systems which use a single image or a video stream. The advantage of these systems is their potential use in uncooperative scenarios like video surveillance. However, in practice, their application is prevented by too low recognition rates of the currently available commercial systems.
Automated 2D face recognition is still one of the most challenging research topics in computer vision, and it has been demonstrated that variations in pose and light are major problems. Other problems are hair and beards and partial occlusion, e.g., from eye glasses. Most of these systems use 2D images or photos to represent the subjects in the gallery. Hence, these methods are limited in their expressiveness and therefore most face recognition systems show good results only for faces under frontal or near frontal pose. Methods that are based on fitting 3D statistical models have been proposed to overcome this issue. They are helpful to recognize people in non-frontal poses or from images with unknown illumination conditions.
Most statistical models are based on a Principle Component Analysis (PCA) of a set of training data. Turk and Pentland use a 2D PCA model to represent faces, the Eigenfaces [Turk91]. As training data set they use a set of photographs of human faces. The pictures are coarsely aligned, but not registered with dense correspondence. Their system works only well on adequate images, i.e. for pictures in frontal view with controlled illumination and without expression, e.g. passport or drivers license photographs.
Cootes and Taylor represent the face as a set of 2D points (Active Shape Models, ASM [Coop95]). As training data they also use a set of face images. But contrary to the Eigenface approach, they manually register the face images, by labeling 152 2D landmark points. The ASM is a PCA model of these face shapes in frontal view. Later, they combine this model with a PCA model of pixel intensities. This model is called Active Appearance Model (AAM). The ASM/AAM separates shape from appearance, however it does not separate between 3D pose changes and shape and between illumination and inherent color of the face. In contrast to these 2D approaches, the Morphable Model represents a face as a 3D shape with per-vertex color. The Morphable Model is a statistical model for shape and per-vertex color that is trained from a dataset of 200 densely registered 3D scans.
Blanz and Vetter used their Morphable Model to reconstruct the 3D shape of a face from a photograph. This fitting is done in an Analysis-by-Synthesis approach by optimizing a cost function that consists of the difference between the rendered model and the input photo and a term that controls the probability within the model. Romdhani and Vetter [Romd05] later improved the fitting by using a cost function that included several features extracted from the image, such as the contours, landmark points, and shading information.
PCA models are widely used, but they have some major drawbacks. PCA is focused on dimensionality reduction. The principal components are holistic, i.e., each component has global support. Hence the influence of each coefficient is not localized and affects the whole shape. As a result, there is, in general, no meaningful interpretation of the components, as can be seen in FIG. 1. Each component encodes some details of the nose and some details of the forehead and some details of the ears, etc., at the same time. This is counter-intuitive when the model is used in an interactive tool. In the context of human faces, we would expect to be able to change, e.g., the shape of the nose independently of the shape of the ear or the chin, but this is not possible with PCA models, i.e., it is not possible to change one facial feature and keep all other vertices fixed. Holistic Morphable Models are not flexible enough to locally adapt to several features at the same time.
It would be desirable if the face space spanned by the model included all reasonable human faces and excluded all non-faces. However, the space spanned by the principal components is too limited and too flexible at the same time. On the one hand, it does not span the space of all faces. Every face in this space is an affine combination of the training samples. As a result, it can poorly represent novel faces, i.e. those which are not in the database used to train the model. Hence, the PCA model is not able to represent all possible human faces. On the other hand, overfitting occurs when the model is used for generalization from partial information and is forced to adapt locally to features [Blanz02]. In particular, overfitting is a practical problem when a PCA Morphable Model is fitted to a photograph. Hence, the model is too flexible, that is—it is able to generate things that are not faces. This overfitting can be repressed by regularization on the cost of poor fitting of the partial information. One has to choose a trade-off between accuracy of the reconstruction and likelihood of the result being a face [Blanz02].
PCA-based models have other severe disadvantages. For example. the number of coefficients is limited by the number of training samples n. For instance, a training set that contains n=100 scans the model is limited to 99 components. As a result, all possible faces have to be represented by a vector of length 99. The PCA model is not flexible enough to represent novel faces and performs well only on the training data, but not on test data.
The very last components only encode noise in the training set. When one tries to adapt the PCA model locally to facial features, this is either not possible at all, or, if possible, only with severe overfitting. Overfitting occurs when an unsuitable statistical model with too many parameters is fitted to novel data. The data is matched well and the function or the surface passes through the points or vertices. However, in between, the model generates arbitrary results. Overfitting can be prevented by reducing the number of parameters of the model or by regularization. Large coefficients lead to a low prior probability. Regularization penalizes large coefficients, thereby preventing overfitting. Both approaches work in a similar way, since the most relevant components with large standard deviation are effected less by regularization than those with small standard deviation.
In view of the above enumerated drawbacks and/or desires for improvements in the art, it is a purpose of the herein described invention to address one or more of such drawbacks and/or desires as well as, or in the alternative, other needs which will become more apparent to the skilled artisan once given the present disclosure.