Research has shown that facial tracking and performance capturing technology have had significant impacts in a broad range of fields that include computer gaming, animations, entertainment, human-computer interface. For example, some of the research has shown that users interacting with a digital avatar, such as an animated face, are 30% more trustworthy than compared with the same interactions with text-only scripts.
Existing facial animation systems follow one of two techniques: performance-based facial animation; or speech-driven facial animation. Performance-based facial animation is currently the most popular technique utilized to generate realistic character facial animation for games and movies. While effective, such techniques require special equipment such as physical markers on a subject, structured light, and camera arrays. As a result, such techniques are impractical for ordinary users.
Speech-driven facial animation is also a common technique, which functions by first mapping raw speech features such as Mel-Frequency Cepstral Coefficients (MPCC) to predefined visual parameters. This technique requires large volumes of corresponding audio and video training data for better generalized performance. The speech is mapped into a phoneme or phoneme state feature and then to the visual parameters. While this method is easier to perform, accuracy depends greatly upon the volume of training data available.