Existing approaches to expression transfer can be broadly grouped into two categories: direct transfer methods and learning-based transfer methods.
Direct transfer methods copy the shape and/or appearance changes of the source person to the target face image. Some prior art methods represent the face by a densely partitioned triangle mesh, usually containing up to 104 triangles. The shape changes of a given expression are transferred to the target face as a set of local affine transformations while preserving the connectivity of the target triangles. However, these methods do not transfer appearance changes. Other prior arty approaches propose a geometric warping algorithm in conjunction with the Expression Ratio Image (ratio between the appearance of the neutral image and the image of a given expression) to copy subtle appearance details such as wrinkles and cast shadows to the target. However, this approach tends to produce artifacts on the target face image since the transferred appearance details are not adapted to the target face.
Learning-based expression transfer methods learn a transformation from a training set of face images that have been labeled across expressions. The correspondence is determined manually or semi-automatically. Existing learning-based expression transfer can be broadly classified into two major approaches: the regression-based and tensor-based methods.
Regression-based methods include two modalities: The first modality is regression between expressions that learns a mapping from a reference expression (e.g., neutral) to the expression to be transferred (e.g., a smile). Given a reference face of a target person, the smile face of the target person can be predicted with a regression specifically learned for the smile expression. However, a major limitation of this method is its inability to represent untrained expressions.
The second modality is regression between subjects that learns a mapping between multiple pairs of corresponding expressions performed by both the source and target subjects, and then uses the learned regression to transfer new expressions. In this case, there are no corresponding images between expressions of different people. One prior art method generates the corresponding images by learning a regression from a neutral face to the predefined expression and applying this mapping to the neutral of the target subject. In addition, the method learns a generic regressor from the shape to the appearance.
Another prior art method learns two Active Appearance Models (AAMs), one for the source and one for the target. It performs expression transfer by learning a mapping between AMMs' coefficients. This method also requires solving for the correspondence between the expressions of the target and source, which is not possible in many realistic applications.
Prior art tensor-based approaches perform Higher-Order Singular Value Decomposition (HOSVD) to factorize the facial appearance into identity, expression, pose, and illumination. Given the factorization, expression transfer is done by first computing the identity coefficients of the new testing person, and then reassembling the identity factor with expression factors learned by the HOSVD. A major drawback of tensor-based approaches is the need to carefully label correspondences across expression, pose, and illumination. Prior art methods have generalized the tensor-based approaches by building non-linear manifolds of human body actions and facial expressions. Similar to the standard tensor-based approaches, these methods require solving for the correspondence of states on the manifold (content) across different subjects (style).
The existing learning-based expression transfer methods rely on the availability and labeling accuracy of the similar expressions in faces of different subjects. However, labeling expressions is time consuming and error prone (i.e., it is hard to capture and solve correspondence for expressions under different intensities). In addition, in many applications it is not possible to have labeled training samples for the target.
The limitations and non-effectiveness of the prior art are overcome by the present invention as described below.