The present invention relates to an apparatus and method for efficient animating of a believable speaking character, preferably but not exclusively a 3-D character, in substantially real time.
The large volume of electronic textual information and other forms of readable communications have begun to spawn methods of delivering messages with more impact and believability. Without a doubt, human facial images combined with speech provide a compelling way of delivering messages. Video messages with or without sound are one example of delivering more believable messages. However, video messages are typically limited to a given subject chosen to be imaged in advance. In many cases, it may not be appropriate or desirable to shown a specific, “real” person to deliver a message. Effort has been made in recent years to develop animated figures that can effectively and flexibly deliver believable messages. However, animating figures and images of human faces and integrating them with human voices to flexibly create believable messages is not a trivial task. Some examples of prior art, the contents of which are hereby incorporated by reference, follow.
U.S. Pat. No. 6,097,381 deals with video images and the creation of a database of spoken phonemes associated with images. The database can subsequently be used to synthesize believable animations of humans speaking. Synthesizing speech or facial movements to match selected speech sequences, to simulate animated image of human speaking is disclosed.
U.S. Pat. No. 5,657,426 discloses synchronizing facial expressions with synthetic speech. Text input is transformed into a string of phonemes and timing data, which are transmitted to an image generation unit. At the same time, a string of synthetic speech samples is transmitted to an audio server. Synchronization between the image generation unit and the audio server produces facial configurations which are displayed on a video device, along with the audio speech.
U.S. Pat. No. 6,052,132 discloses a technique for providing a computer generated face having coordinated eye and head movement by providing a computer generated movable head and at least one computer generated movable eye. The movement of the movable head and the movable eye are coordinated such that the movement of the movable head follows the movement of the movable eye. The prior art thus addresses some elements of the image-speech interface, however, realism is lacking wherever real time output is required. That is to say the prior art produces realistic images only when animation of the character is carried out off line, and the subtleties of realism are not possible to provide in real time facial-head animation. In much of the prior art, output video images have been concatenated from fixed input video, leaving a less than desirable effect from a standpoint of believability.
Psychologically speaking, the human eye is very attuned to small and subtle nuances in facial and head movement and expression, making believability much harder to achieve in facial-head animation than in animation of any other part of the body. Consequently, it is impossible to ignore subtleties, and numerous variables must be dealt with. The need to deal with numerous variables, inherent in creating a believable facial image having speech characteristics, has been a barrier to providing such facial animation in a real time setting.