1. Field of the Invention
The present invention relates to controlling animations and more specifically to a system and method of providing reactive behavior to virtual agents when a human/computer interaction is taking place.
2. Discussion of Related Art
Much work has recently been focused on generating visual Text-to-Speech interactions between a human user and a computer device. The natural interaction between a computer and human is increasing as conversational agents or virtual agents improve, but the widespread acceptance and use of virtual agents is hindered by un-natural interactions with the virtual agent. Studies show that a customer's impression of a company's quality is heavily influenced by the customer's experience with the company. Brand management and customer relations management (CRM) drive much of a company's focus on its interaction with the customer. When a virtual agent is not pleasing to interact with, a customer will have a negative impression of the company represented by the virtual agent.
Movements of the head of a virtual agent must be natural or viewers will dislike the virtual agent. If the head movement is random, that impression is more synthetic. In some cases, the head appears to float over a background. This approach is judged by many viewers to be “eerie.”
One can try to interpret the meaning of the text with a natural language understanding tool and then derive some behavior from that. Yet, such an approach is usually not feasible, since natural language understanding is very unreliable. A wrong interpretation can do considerable harm to the animation. For example, if the face is smiling while articulating a sad or tragic message, the speaker comes across as cynical or mean spirited. Most viewers dislike such animations and may become upset.
An alternative approach is to use ‘canned’ animation patterns. This means that a few head motion patterns are stored and repeatedly applied. This can work for a short while, yet the repetitive nature of such animations soon annoys viewers.
Yet another approach is to provide recorded head movements for the virtual agent. While this improves the natural look of the virtual agent, unless those head movements are synchronized to the text being spoken, to the viewer the movements become unnatural and random.
Movement of the head of a virtual agent is occasionally mentioned in the literature but few details are given. See, e.g., Cassell, J, Sullivan, J. Prevost, S., Churchill, E., (eds.), “Embodied Conversational Agents”, MIT Press, Cambridge, 2000; Hadar, U., Steiner, T. J., Grant, E. C., Rose, F. C., “The timing of shifts in head postures during conversation”, Human Movement Science, 3, pp. 237-245, 1984; and Parke, F. I., Waters, K., “Computer Facial Animation”, A. K. Peters, Wellesley, Mass., 1997.
Some have studied emotional expressions of faces and also describe non-emotional facial movements that mark syntactic elements of sentences, in particular endings. But the emphasis is on head movements that are semantically driven, such as nods indicating agreement. See, e.g., Ekman, P., Friesen, W. V., “Manual for the Facial Action Coding System”, Consulting Psychologists Press, Palo Alto, 1978.
Conventionally, animations in virtual agents are controlled through interpretation of the text generated from a spoken dialog system that is used by a Text-to-Speech (TTS) module to generate the synthetic voice to carry on a conversation with a user. The system interprets the text and manually adds movements and expressions to the virtual agent.
Yet another attempt at providing virtual agent movement to illustrated by the FaceXpress development product available for virtual agents offered through LifeFX®. The FaceXpress is a tool that enables a developer to control the expression of the virtual agent. FIG. 1 illustrates the use of the tool 10. In this web-based version of the virtual agent development tool the developer of the virtual agent organizes preprogrammed gestures, emotions and moods. Column 12 illustrates the selected dialog 14, gestures 16 and other selectable features such as punctuators 32, actions 34, attitudes 36 and moods 38. Column 18 illustrates the selectable features. Shown is column 18 when the gestures option is selected to disclose the available pre-programmed gestures smile 20, frown 40 and kiss 42. The developer drags the desired gesture from column 18 to column 22. Column 22 shows the waveform of the text 24, a timing ruler 44, the text spoken by the virtual agent 26 and rows for the various features of the agent, such as the smile 28. A moveable amplitude button 46 enables the developer to adjust the parameters of the smile feature. While this process enables the developer to control the features of a virtual agent, it is a time-consuming and costly process. Further, the process clearly will not enable a real-time conversation with a virtual agent where every facial movement must be generated live. With the increased capability of synthetic speech dialog systems being developed using advanced dialog management techniques that remove the necessity for preprogrammed virtual agent sentences, the opportunity to pre-program virtual agent movement will increasingly disappear.
The process of manually adding movements to the virtual agent is a slow and cumbersome process. Further, quicker systems do not provide a realistic visual movement that is acceptable to the user. The traditional methods of controlling virtual agent movement preclude the opportunity of engaging in a realistic interaction between a user and a virtual agent.