Image sequences with lip movements synchronized with speech are commonly called “talking heads.” Talking heads are useful in applications of human-machine interaction, e.g. reading emails, news or eBooks, acting as an intelligent voice agent or a computer assisted language teacher, etc. A lively talking head can attract the attention of a user, make the human/machine interface more engaging or add entertainment to an application.
Generating talking heads that look like real people is challenging. A talking head needs to be not just photo-realistic in a static appearance, but exhibit convincing plastic deformations of the lips synchronized with the corresponding speech, because the most eye-catching region of a talking face involves the “articulators” (around the mouth including lips, teeth, and tongue).