1. Field
The present disclosure relates to computer-generated animation and, more specifically, to repurposing existing animated content of a character to synthesize new animated content of the character.
2. Related Art
A computer-generated feature animation is typically a labor- and time-intensive process that results in the creation of characters with compelling and unique personalities. However, because of the labor- and time-intensive nature of computer-generated feature animation, taking one of these characters and synthesizing new content consisting of compelling and expressive animation of the character presents a challenge.
Statistical methods have been used to analyze and synthesize new motion data (e.g., as described in Brand and Hertzmann 2000, Mukai and Kuriyama 2005, and Lau et al. 2009). In particular, the Gaussian Process Latent Variable Model (GPLVM) (e.g., as described in Lawrence 2006) has been used for a number of applications directed to animation, such as satisfying constraints, tracking human motion (e.g., as described in Grochow et al. 2004, Urtasun et al. 2005, and Wang et al. 2008), or providing interactive control (e.g., as described in Ye and Liu 2010 and Levine et al. 2012). The GPLVM is used to reduce the dimension of the motion data and to create a statistical model of the animation.
However, while GPLVM tends to keep far data separated in a reduced dimensional space, it makes no effort to keep similar data points close together. Thus, to address this shortcoming, modifications to the GPLVM have been proposed to make it better suited for modeling motion data by addressing this limitation of the model. For example, back constraints (e.g., as described in Lawrence and Quirionero Candela 2006) have been applied to the GPLVM to preserve local distances. For another example, dynamic models (e.g., as described in Wang et al. 2006 and Lawrence 2007) have been introduced to model the time dependencies in animation data. For another example, a connectivity prior (e.g., as described in Levine et al. 2012) has been proposed to ensure a high degree of connectivity among the animation data embedded in the low-dimensional latent space.
Another shortcoming of the GPLVM is that the prior methods that model animation data using a GPLVM have only been used for full-body motion capture data. Similar techniques have not been applied to manually created animation for a film-quality character. A key difference between motion capture data and manually created film-quality animation is that the manually created animation from a film-quality animation lies in a significantly higher dimensional space than the motion capture data.
Furthermore, data-driven approaches to character control and animation synthesis have focused only on full-body tasks, which are based on motion graphs (e.g., as described in Kovar et al. 2002, Lee et al. 2002, Treuille et al. 2007, Lo and Zwicker 2008, and Lee et al. 2009). These methods use a graph structure to describe how motion clips from a library can be connected and reordered to accomplish a task. However, while these approaches perform well with a large training set, smaller data sets are not well-suited for motion graphs because of a lack of variety and transitions in the motions.
Similarly, other existing methods for character control include data-driven and physics-based approaches (e.g., as described in Coros et al. 2009, Muico et al. 2009, Levine et al. 2012, and Tan et al. 2014) are applied to full-body human motion or hand motion (e.g., as described in Andrews and Kry 2013). Thus, the tasks that the controllers are trained for can be quantifiably measured, such as locomotion or reaching tasks. However, existing methods do not animate a non-human character's face because tasks for facial animation are difficult to quantify.
Facial animation of non-human characters can be controlled by re-targeting recorded expressions. A commonly used method is blendshape mapping (e.g., as described in Buck et al. 2000, Chai et al. 2003, Seol et al. 2011, Bouaziz et al. 2013, and Cao et al. 2013), which maps expressions from an input model onto corresponding expressions from the target character. Then, motion is generated by blending between the different facial shapes of the character. This approach uses an input model, such as a video recording of a human, to drive the animation of the character. Blendshape mapping approaches, however, control facial animation with recordings of a model. In addition, blendshape mapping approaches require that the character's face be animated with blendshapes.
Lastly, as is well-known in the field, animated characters are controlled through an underlying rig, which deforms a surface mesh that defines a character. A variety of methods exist to map a character's rig controls to deformations of the surface mesh (e.g., as described in Barr 1984, Sederberg and Parry 1986, Magnenat-Thalmann et al. 1988, Singh and Fiume 1998, and Lewis et al. 2000). However, a technique that does not make assumptions about rig controls, and thus does not depend on an implementation of a particular type of mapping method, is needed.