Animation of virtual characters is a popular storytelling medium across many domains. But traditional workflows for doing so are labor intensive. For example, animators often draw every frame by hand, or manually specify how characters move when uttering a particular word. Animators specify how a character's lips move in accordance with the character's speech. For example, when a character utters the syllable “a,” the character's mouth makes the same shape that a human's mouth would make when speaking the syllable.
Automated animation removes the burden of hand-animating every mouth movement. For example, in live or performance animation, a computing system controls cartoon characters in response to an animator's input or speech. But existing solutions either cannot operate in real time, i.e., perform live animation, or are not able to provide an animation that is realistic and accurate. For example, existing solutions can result in a character's mouth not moving at all or moving too much relative to an expected movement.
Additionally, solutions for live animation are often based on prediction models that predict animation sequences from speech. But such models require the use of training data, which is time-consuming to generate because audio sequences are hand-mapped to visemes. One minute of speech can take five to seven hours of work to hand-animate.
Accordingly, improved solutions are needed for live animation and generating training data for prediction models that are used for live animation.