Generating animated sequences of talking heads in multimedia and text-to-speech applications can be quite tedious, especially for capturing images representing various mouth shapes. As mouth shape is affected by co-articulation phenomenon (influence of one sound on another), achieving a good correspondence between audio and an animated head necessitates a large library of animated shapes. Developments in 3D modeling and the availability of faster computers have sparked a growing interest in the development of realistic talking heads based on images taken from real people and advanced modeling techniques. However, even if creating a computer model of a real head based on a set of pictures is becoming possible, it is still difficult to create a library of mouth shapes that is necessary to perform a good synchronization between the audio data and the visual data or video data.
While strides continue to be made in this regard, previous suggested solutions involve building a co-articulation library using a large number of mouth shapes, and this process is very time consuming. Currently, there is no effective way of building a library of mouth shapes that produces a good synchronization between audio and video short of having a particular speaker spend hours recording examples of his or her mouth shapes.
While it would be highly desirable to be able to build a mouth shape library that produces a good synchronization between audio and video from only a small amount of mouth shape data, that technology has not heretofore existed. Therefore, providing a system and method for building such a library of mouth shapes using only a small amount of mouth shape data remains the task of the present invention.