1. Field of the Invention
The present invention is related to a computer-generated synthetic talking head model and, in particular, to such a simulation for use with visual speech synthesis.
2. Description of Related Art
Research in psychology has revealed that humans perceive both acoustic and visual signals during face-to-face communications. These visual cues improve speech recognition. The use of such visual information has led to the development of visual speech synthesis, also known as a "talking head", in which a computer-generated synthesized facial image with speech articulators is configured in synchronization with synthetic acoustic speech. Visual speech synthesis can assist listeners in understanding synthetic acoustic speech, and has a wide diversity of applications including video conferencing, artificial agents for human-machine interaction, and speech training for the hearing impaired.
Visual information can be divided into two classes: that which includes speech-related facial motions which directly influence human bimodal (acoustic and visual) perception of speech, such as movements of the mouth and lips, as well as facial motions not directly related to the production of speech (referred to as "paralinguistic signals") such as facial expressions and head movements and gestures. Paralinguistic signals affect how humans accept the overall quality and realism of visual speech synthesis.
Heretofore, purely deterministic (cyclic) or purely random motion have commonly been used to simulate rotational movement of a synthesized talking head. A purely deterministic motion results, however, in predictable and thus unnatural rotational head movement. Random motion, on the other hand, is not predictable but produces abrupt rotational head movements which also appear unnatural. Thus, the overall quality of visual speech synthesis using such conventional methods is poor.
It is therefore desirable to simulate rotational movement of a synthesized talking head in a natural and realistic manner which is spontaneous, in that it is somewhat random, and which nevertheless provides relatively smooth movement. Moreover, the simulation of horizontal motion of the talking head should be adaptable for dynamic modification as a function of the number of listeners that speaker is addressing. Finally, the rotational movement of the synthesized head should simulate natural spans of sustained attention to particular sections of an audience.