The present invention relates generally to methods and systems for coding of images, and more particularly to a method and system for coding images of facial animation.
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously—text, and Facial Animation Parameters (FAPs). In this architecture, text input is sent to a Text-To-Speech (TTS) converter at a decoder that drives the mouth shapes of the face. FAPs are sent from an encoder to the face over the communication channel. Currently, the Verification Model (VM) assumes that synchronization between the input side and the FAP input stream is obtained by means of timing injected at the transmitter side. However, the transmitter does not know the timing of the decoder TTS. Hence, the encoder cannot specify the alignment between synthesized words and the facial animation. Furthermore, timing varies between different TTS systems. Thus, there currently is no method of aligning facial mimics (e.g., smiles, and expressions) with speech.
The present invention is therefore directed to the problem of developing a system and method for coding images for facial animation that enables alignment of facial mimics with speech generated at the decoder.