The background pertaining to the present invention is as follows:
1. Face Motion Capture
Face expression capture is an important component of the realistic graphics, which is widely applied to films, animations, games, internet chat and education and other fields. A face animation system based on the face motion capture is used to estimate expressions and motions of a user, and map them into another object model. There are plenty of relevant techniques to achieve this goal at present. In order to interact directly with users, active sensing methods are usually adopted, which include placing some facial markers on faces (Williams, L. 1990. Performance driven facial animation. In Proceedings of SIGGRAPH, 234-242; Huang, H., Chai, J., Tong, X., and Wu, H., T., 2011. Leveraging motion capture and 3d scanning for high-fidelity facial performance acquisition. ACM Trans. Graph. 30, 4, 74:1-74:10.), or projecting structured light (Zhang, L., Snavely, N., Curless, B., and Seitz, S. M. 2004. Space time faces: high resolution capture for modeling and animation. ACM Trans. Graph. 23, 3, 548-558; Weise, T., Li, H., Gool, L. V., and Pauly, M. 2009. Face/off: Live facial puppetry. In Eurographics/Siggraph Symposium on Computer Animation.). These methods may acquire accurate face geometry with high resolution, however, these active sensing methods usually need to be supported by expensive equipment. In the meantime, due to interference from facial markers or structured light, they are not user-friendly and, thus, cannot be widely applied to ordinary users.
Another kind of systems are passive systems, which do not actively send signals to the environment where they locate or place facial markers on faces, but merely analyze and capture face motions according to received color information and etc. Where some methods merely use a single video camera to capture face motions, including “Essa, I., Basu, S., Darrell, T., and Pentland, A. 1996. Modeling, tracking and interactive animation of faces and heads: Using input from video. In Computer Animation, 68-79; Pighin, F., Szeliski, R., and Salesin, D. 1999. Resynthesizing facial animation through 3d model-based tracking. In International Conference on Computer Vision, 143-150; CHAI, J.-X., XIAO, J., AND HODGINS, J. 2003. Vision-based control of 3d facial animation. In Eurographics/SIGGRAPH Symposium on Computer Animation, 193-206; Vlasic, D., Brand, M., Pfister, H. and Popovic, J. 2005. Face transfer with multilinear models.” and other work. A drawback of these methods is that the precision of their results is poor, and thereby they cannot handle with large rotations or exaggerated expressions of faces, besides, certain environments are required when using these methods, for example, these methods can merely be adopted in an indoor environment with uniform illumination and without interferences from shadow and highlight.
Some methods use a camera array, which may capture face data from a plurality of angles of view and convert it into stereo data for 3D reconstruction, these work include “BEELER, T., BICKEL, B., BEARDSLEY, P., SUMNER, R., AND GROSS, M. 2010. High-quality single-shot capture of facial geometry. ACM Trans. Graph. 29, 4, 40:1-40:9; BRADLEY, D., HEIDRICH, W., POPA, T., AND SHEFFER, A. 2010. High resolution passive facial performance capture. ACM Trans. Graph. 29, 4, 41:1-41:10; BEELER, T., HAHN, F., BRADLEY, D., BICKEL, B., BEARDSLEY, P., GOTSMAN, C., SUMNER, R. W., AND GROSS, M. 2011. High-quality passive facial performance capture using anchor frames. ACM Trans. Graph. 30, 4, 75:1-75:10.” and etc.; these methods may obtain relatively accurate 3D face expressions, but also require expensive equipments and have high requirements on environments and other disadvantages.
2. Vision-Based Face Feature Point Tracking
The capture of face expressions usually needs to be performed by tracking feature points of faces in input images, such as the corners of the eyes, ends of mouth and other locations. For a common input video, an optical flow (Optical Flow) method is generally adopted. However, due to influence of the input data noise, the optical flow locating is not very reliable for those inconspicuous face feature points (such as points on cheeks), and a drift (Drift) error may often be resulted from the accumulation of errors between frames. Besides, the optical flow method may cause relatively large error in processing fast motions, illustration changes and other aspects.
In order to track the feature points more accurately, some operations use geometric constraints between the feature points. In this way, each feature point not only relates to its local information calculation, but also is affected by other feature points. Different types of geometric constraints are widely used, which include a limitation for drift of the feature points when expressions change (CHAI, J.-X., XIAO, J., AND HODGINS, J. 2003. Vision-based control of 3d facial animation. In Eurographics/SIGGRAPH Symposium on Computer Animation, 193-206.), meeting physics-based deformable model requirements (ESSA, I., BASU, S., DARRELL, T., AND PENTLAND, A. 1996. Modeling, tracking and interactive animation of faces and heads: Using input from video. In Computer Animation, 68-79; DECARLO, D., AND METAXAS, D. 2000. Optical flow constraints on deformable models with applications to face tracking. Int. Journal of Computer Vision 38, 2, 99-127.), and some corresponding relationships of face models constructed from plenty of sample spaces (PIGHIN, F., SZELISKI, R., AND SALESIN, D. 1999. Resynthesizing facial animation through 3d model-based tracking. In International Conference on Computer Vision, 143-150; BLANZ, V., AND VETTER, T. 1999. A morphable model for the synthesis of 3d faces. In Proceedings of SIGGRAPH, 187-194; VLASIC, D., BRAND, M., PFISTER, H., AND POPOVIC 766, J. 2005. Face transfer with multilinear models. ACM Trans. Graph. 24, 3(July), 426-433.). These methods can track face feature points in images and videos to some extent, but because what they obtained are all 2D feature points in images, they have limitations on processing rotations.
3. 3D Face Model
In our work, during the preprocessing process, 3D information is obtained from 2D images by virtue of a 3D face model.
In existing graphics and visual applications, various 3D face models have been widely applied. In face animation applications, an expression blendshape model (Blendshapes) is widely applied. This is a subspace expression to express face motions, which includes a series of basic face expressions that constitute a linear space of face expressions. By adopting the blendshape model, various face animation effects may be obtained through calculation, such as by morphing (Morphing) basic face motions therein (PIGHIN, F., HECKER, J., LISCHINSKI, D., SZELISKI, R., AND SALESIN, D. H. 1998. Synthesizing realistic facial expressions from photographs. In Proceedings of SIGGRAPH, 75-84.) or by linear combinations of the basic face motions therein (Linear combinations) (LEWIS, J. P., AND ANJYO, K. 2010. Direct manipulation blendshapes. IEEE CG&A 30, 4, 42-50; SEO, J., IRVING, G., LEWIS, J. P., AND NOH, J. 2011. Compression and direct manipulation of complex blendshape models. ACM Trans. Graph. 30, 6.) and etc.
Multilinear models (Multilinear Models) represent a blendshape model decomposition with plenty kinds of control attributes (such as identity, expression, mouth articulation). One of the important characteristics of the expression blendshape model is that, different identity's expressions correspond to similar basic motion coefficients in the blendshape model. By virtue of this attribute, many face animation applications use the expression blendshape model, and transfer face motions of users to virtual avatars by delivering the basic motion coefficients.