In the field of computer graphics (CG) animation (e.g., game), a chat using an avatar, or a toy, the shape of the mouth of a character is changed when reproducing human voice from a speaker so that the displayed character or the toy character virtually speaks.
Typically, an animator listens to the target voice and determines the shape of the mouth of the character by an empirical rule to provide mouth shape setting data that can be synchronized with reproduction of the voice. This method cannot accurately change the shape of the mouth of the character corresponding to the voice, but can relatively easily change the shape of the mouth of the character in synchronization with reproduction of the voice. Therefore, this method has been employed for game production and TV animation production.
However, such a mouth shape control method does not necessarily implement a satisfactory image quality when used for realistic three-dimensional computer graphics (3DCG) (e.g., movie) or a guide character that is displayed on a guide device used in a museum or the like. Therefore, a mouth shape control method that can accurately change the shape of the mouth of the character corresponding to sound has been desired.
Such a demand may be satisfied by extracting formant information that characterizes a vowel from the reproduction target voice (i.e., identifying the vowel), and selectively outputting a given animation image that is synchronized with the identified vowel to automatically generate an animation image so that the shape of the mouth of the character is changed corresponding to the sound (see JP-A-2003-233389, for example).
A chat system using an avatar that is configured so that a server analyzes voice received from a terminal by voice recognition to determine the shape of the mouth of the avatar that corresponds to the phoneme, and transmits information including the determined shape of the mouth of the avatar to the terminal so that the shape of the mouth of the avatar is accurately displayed on the terminal corresponding to the voice, has also been known (see JP-A-2006-65684, for example).