This invention relates to a method for image processing, in particular to the manipulation (detecting, recognizing and/or synthesizing) of images of three-dimensional objects, such as human faces, on the basis of a morphable model for image synthesis. Furthermore, the invention relates to an image processing system for implementing such a method.
One field of image manipulation concerns particularly the manipulation of human faces. Modeling human faces has challenged researchers in computer graphics since its beginning. Since the pioneering work of Parke [see ref. numbers 23 and 24 and the list of numbered references at the end of this specification], various techniques have been reported for modeling the geometry of faces [ref. numbers 9, 10, 20, 31, 19] and for animating them [ref. numbers 26, 13, 17, 29, 20, 35, 27]. A detailed overview can be found in the book of Parke and Waters [22].
The techniques developed for the animation of faces can be roughly separated in those that rely on physical modeling of facial muscles [35], and in those applying previously captured facial expressions to a face [23, 2]. These performance based animation techniques compute the correspondence between the different facial expressions of a person by tracking markers glued to the face from image to image. To obtain photo-realistic face animations, a high number of markers (e.g. up to 182 markers) have to be used [13].
Computer aided modeling of human faces still requires a great deal of expertise and manual control to avoid unrealistic, non-face-like results. Most limitations of automated techniques for face synthesis, face animation or for general changes in the appearance of an individual face can be described either as the problem of finding corresponding feature locations in different faces or as the problem of separating realistic faces from faces that could never appear in the real world. The correspondence problem is crucial for all morphing techniques, both for the application of motion-capture data to pictures or 3D face models, and for most 3D face reconstruction techniques from images. A limited number of labeled feature points marked in one face, e.g., the tip of the nose, the corner of the eye and less prominent points on the cheek, must be located precisely in another face. The number of manually labeled feature points varies from application to application, but usually ranges from 50 to 300. Only a correct alignment of all these points allows acceptable intermediate morphs, a convincing mapping of motion data from the reference to a new model, or the adaptation of a 3D face model to 2D images for xe2x80x98video cloningxe2x80x99. Human knowledge and experience is necessary to compensate for the variations between individual faces and to guarantee a valid location assignment in the different faces. At present, automated matching techniques can be utilized only for very prominent feature points such as the corners of eyes and mouth.
A second type of problem in face modeling is the separation of natural faces from non faces. For this, human knowledge is even more critical. Many applications involve the design of completely new natural looking faces that can occur in the real world but which have no xe2x80x9crealxe2x80x9d counterpart. Others require the manipulation of an existing face according to changes in age, body weight or simply to emphasize the characteristics of the face. Such tasks usually require time-consuming manual work combined with the skills of an artist.
It is accordingly an object of the invention to provide improved image processing methods and systems capable of meeting the above problems, which particularly process images of three-dimensional objects in a more flexible and effective manner.
According to the invention, a parametric face modeling technique assists in solving both of the above problems. First, arbitrary human faces can be created simultaneously controlling the likelihood of the generated faces. Second, the system is able to compute correspondence between new faces. Exploiting the statistics of a large data set of 3D face scans (geometric and textural data, Cyberware(trademark)) a morphable face model has been built which allows to recover domain knowledge about face variations by applying pattern classification methods. The morphable face model is a multidimensional 3D morphing function that is based on the linear combination of a large number of 3D face scans. Computing the average face and the main modes of variation in the dataset, a probability distribution is imposed on the morphing function to avoid unlikely faces. Also, parametric descriptions of face attributes such as gender, distinctiveness, xe2x80x9chookedxe2x80x9d noses or the weight of a person, have been derived by evaluating the distribution of exemplar faces for each attribute within our face space.
Having constructed a parametric face model that is able to generate almost any face, the correspondence problem turns into a mathematical optimization problem. New faces, images or 3D face scans, can be registered by minimizing the difference between the new face and its reconstruction by the face model function. An algorithm has been developed that adjusts the model parameters automatically for an optimal reconstruction of the target, requiring only a minimum of manual initialization. The output of the matching procedure is a high quality 3D face model that is in full correspondence with the morphable face model. Consequently, all face manipulations parameterized in the model function can be mapped to the target face. The prior knowledge about the shape and texture of faces in general that is captured in our model function is sufficient to make reasonable estimates of the full 3D shape and texture of a face even when only a single picture is available. When applying the method to several images of a person, the reconstructions reach almost the quality of laser scans.
A key part of the invention is a generalized model of human faces. Similar to the approach of DeCarlos et al. [9], the range of allowable faces according to constraints derived from prototypical human faces is restricted. However, instead of using a limited set of measurements and proportions between a set of facial landmarks, the densely sampled geometry of the exemplar faces obtained by laser scanning (Cyberware(trademark)) are directly used. The dense modeling of facial geometry (several thousand vertices per face) leads directly to a triangulation of the surface. Consequently, there is no need for variational surface interpolation techniques [9, 21, 30]. The inventors also added a model of texture variations between faces. The morphable 3D face model is a consequent extension of the interpolation technique between face geometries, as introduced by Parke [24]. Computing correspondence between individual 3D face data automatically, the invention enables increasing the number of vertices used in the face representation from a few hundreds to tens of thousands.
Moreover, a higher number of faces can be used and thus, between hundreds of xe2x80x98basisxe2x80x99 faces rather than just a few can be interpolated. The goal of such an extended morphable face model is to represent any face as a linear combination of a limited basis set of face prototypes. Representing the face of an arbitrary person as a linear combination (morph) of xe2x80x9cprototypexe2x80x9d faces was first formulated for image compression in telecommunications [7]. Image-based linear 2D face models that exploit large data sets of prototype faces were developed for face recognition and image coding [3, 16, 34].
Different approaches have been taken to automate the matching step necessary for building up morphable models. One class of techniques is based on optical flow algorithms [4, 3] and another on an active model matching strategy [11, 15]. Combinations of both techniques have been applied to the problem of image matching [33]. According to the invention, an extension of this approach to the problem of matching 3D faces has been obtained.
The corresponding problem between different three-dimensional face data has been addressed previously by Lee et al.[18]. Their shape-matching algorithm differs significantly from the invention in several respects. First, the correspondence is computed in high resolution, considering shape and texture data simultaneously. Second, instead of using a physical tissue model to constrain the range of allowed mesh deformations, the statistics of example faces are used to keep deformations plausible. Third, the system of the invention does not rely on routines that are specifically designed to detect the features exclusively found in human faces, e.g., eyes, nose and the like.
The matching strategy of the invention can be used not only to adapt the morphable model to a 3D face scan, but also to 2D images of faces. Unlike a previous approach [32], the morphable 3D face model is now directly matched to images, avoiding the detour of generating intermediate 2D morphable image models. As an advantageous consequence, head orientation, illumination conditions and other parameters can be free variables subject to optimization. It is sufficient to use rough estimates of their values as a starting point of the automated matching procedure.
Most techniques for xe2x80x98face cloningxe2x80x99, the reconstruction of a 3D face model from one or more images, still rely on manual assistance for matching a deformable 3D face model to this the images [24, 1, 28]. The approach of Pighin et al. [26] demonstrates the high realism that can be achieved for the synthesis of faces and facial expressions from photographs where several images of a face are matched to a single 3D face model. The automated matching procedure of the invention can be used to replace the manual initialization step, where several corresponding features have to be labeled in the presented images.
One particular advantage of the invention is that it works directly on faces without manual markers. In the automated approach the number of markers is extended to its limit. It matches the full number of vertices available in the face model to images. The resulting dense correspondence fields can even capture changes in wrinkles and map these from one face to another.
The invention teaches a new technique for modeling textured 3D faces. 3D faces can either he generated automatically from one or more photographs, or modeled directly through an intuitive user interface. Users are assisted in two key problems of computer aided face modeling. First, new face images or new 3D face models can be registered automatically by computing dense one-to-one correspondence to an internal face model. Second, the approach regulates the naturalness of modeled faces avoiding faces with an xe2x80x9cunlikelyxe2x80x9d appearance.
Applications of the invention are in particular in the fields of facial modeling, registration. photogrammetry, morphing, facial animation, computer vision and the like.