A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of Invention
This invention relates to the display of animated objects and characters. The invention is also related to the architecture, related components, and communications between each module or component of the architecture. The invention is more particularly related to an architecture that reduces an amount of message communications needed between the modules and components of the conversational character.
2. Discussion of Background
Synthetic, animated characters can be divided into two broad categories: characters that are directly controlled by a human user""s actions, and characters that perform behaviors, independent of the user""s actions, either autonomously or through pre-compiled scripts. A taxonomy of synthetic character types is illustrated in FIG. 1.
Characters controlled by a user""s actions are often called an xe2x80x9cavatarxe2x80x9d and generally serve as a representation in a virtual environment of the user who controls its behavior. These avatar characters are used in graphical chat-rooms and on-line virtual worlds such as Habitat, the Palace, BodyChat [Vilhjalmsson97], Oz Virtual, OnLive! Technologies and Worlds, Inc.
Due to the computational complexity of real-time interactions in graphical virtual environments, avatar representations tend to be graphically simplistic and insufficient for representing spontaneous gestures, facial expressions, and other non-verbal behaviors. Moreover, because the input modalities are severely restricted in online virtual worlds, generally confined to the mouse and keyboard, avatar users are forced to exercise fine-grained, conscious control over each gestural movement.
Vilhjalmsson""s BodyChat system attempts to overcome this difficulty by integrating a model of awareness and turn-taking behaviors into its avatars. For example, giving the avatars some autonomy to make decisions about where to look based on user defined parameters for conversational engagement.
Another use of directly controlled characters is for automatically generating animations based on the movements of human actors. By correlating points on an actor""s body with nodes in the graphical representation of the character, this xe2x80x9cperformance animationxe2x80x9d technique imbues the character with the ability to produce fine-grained gestures and facial expressions, and exhibit realistic gaze behaviors and body movements.
Characters that are not directly controlled by the user can be subdivided into two groups: those whose behaviors are scripted in advance, and those whose behaviors are essentially autonomous, and derived at runtime based on inputs from the user. The range of behaviors of the former type of character must be explicitly defined by the character""s creator. One advantage of pre-scripting is that the integration of verbal and non-verbal behaviors need not be calculated at runtime, thereby avoiding complicated on-the-fly planning of motor controls in the animation model.
Scripted characters, on the other hand, are limited in their ability to interact with users and react to multimodal user inputs. Examples of scripted character systems include:
Document Avatars [Bickmore97]. These characters are attached to hypertext documents, and can be scripted to perform specific behaviors when certain parts of the document (e.g. links) are selected. Document avatars can be used to provide guided tours of a document, representing a particular reader""s viewpoint. They can be scripted to speak, move around the document, point to objects and activate links.
Microsoft Agent [Microsoft97]. These characters can be scripted to speak a text string, perform specific animation sequences, hide, move and resize. The user interacts with a character by dragging it or selecting commands from a pop-up menu.
Jack Presenter [Badler97]. This system allows an anthropomorphically correct 3D animated figure to be scripted to give a presentation. The character""s author provides the narrative text which includes annotations describing where, when and what type of gestures should occur. Users simply observe the character""s pre-determined behaviors.
PPP Persona [Andre96]. This project uses a planning system to plan tutorials of specified material given a target time duration for the presentation. Presentations are not scripted by human authors, but are instead created by a planning system. Unlike Document Avatars and Microsoft Agent characters, users cannot interact with the characters during a presentation.
The second group of indirectly controlled characters are the autonomous (or semi-autonomous) characters. Work in this area can be further divided into two categories: entertainment/simulation characters, and task-based characters. The former category includes non-human character representations (e.g. The MIT Media Laboratory""s ALIVE system [Maes94], PF Magic""s Dogz, Fujitsu Interactive""s Fin Fin, and CMU""s Oz) as well as systems for authoring anthropomorphic virtual actors (the NYU Media Research Laboratory""s Improv system [Perlin96], and Stanford""s Virtual Theater Project).
Prior task-based autonomous characters include the following systems:
Microsoft Office Characters The MS Office suite of applications includes a collection of animated characters to provide user assistance and an interface to the online documentation. These characters can respond to typed, free-form questions, and respond with text balloons containing mouse-clickable menu options.
Microsoft Persona [Microsoft97] The Persona project allows a user to control a computerized jukebox through an animated character who accepts speech input and produces spoken output with limited spontaneous gestures.
Animated Conversation [Cassell94] In this system, two animated characters, Gilbert and George, can converse with one another, using context-appropriate speech, gestures and facial expressions, to negotiate banking transactions in a virtual bank.
Ymir [Thorisson96] Ymir is an architecture for autonomous characters that display turn-taking and other interactional competencies. The user interacts with Gandalf, an animated character developed in the Ymir architecture, using natural speech and gestures to ask questions about the solar system. Of the prior art cited above, only the Gandalf/Ymir system utilizes some natural non-verbal inputs such as gesture and head position.
Systems having user interfaces based on social rules of engagement, rather than physical tools (such as a desktop or other metaphor) are discussed in Prevost et al., entitled xe2x80x9cMethod and Apparatus for Embodied conversational Characters with Multimodal I/O in an Interface Devicexe2x80x9d, U.S. patent application Ser. No. 09/223,637, XERXF 1017 MCF/JWC, which is incorporated herein by reference, in its entirety.
Prevost et al. also proposes an architecture for conversational characters, that, in at least one embodiment includes various modules or components that communicate via messages. The proposed architecture provides a framework in which a conversational character may perform the processing needed to interact with a human user.
However, in each prior art system for conversational characters, either the architecture itself is insufficient (without reactive and deliberative processing, for example), or includes a large amount of processing and message passing.
The present inventors have realized that a large amount of message passing, even when encapsulated in a cognitively correct architecture (the way people actually process information in dialog), can be too cumbersome or have too slow a processing time to make animations work as seamlessly as is desired by a human interacting with the animation. The present invention provides a streamlined architecture with reduced message passing that effectively allows all pertinent processing functions (including reactive and deliberative processing), along with I/O functions to be performed faster than previous conversational character architectures and systems.
The present invention includes a speech manager that effectively coordinates inputs to a conversational character (including speech recognition and vision data), an Action/Reaction scheduler having rules for expression of interactive behavior, a dialog manger for determining responses including speech content and other facial expressions and gestures that necessarily need to be included with content, and an animation system that implements content and reactions determined by the Action/Reaction and Dialog manager modules.
The present invention may be embodied in an apparatus for implementing an autonomous animated character, comprising an animation system configured to control said animated character based on commands, an action scheduler configured to, receive inputs related to at least one of said animated character and a user of said animated character, and send commands based on said inputs to said animation system to control said animated character, a vision mechanism configured to send a location of said user to said action scheduler as one part of said inputs, a dialogue manager configured to, receive speech input records and determine speech, actions, and gesture responses to be performed by said animated character, and provide said speech, actions, and gesture responses to said action scheduler as a second part of said inputs, and a speech manager configured to, receive speech inputs from said user, prepare and send a speech on message to said action scheduler indicating speech inputs are being received, and convert the received speech to a speech input record and send the speech input record to said dialogue manager.
The invention includes a method of controlling an animated character, comprising the steps of, identifying occurrence of an input to said animated character, preparing a lightweight record identifying said input occurrence, transferring said lightweight record to a action scheduler, preparing a reactive response for said animated character in response to the input occurrence identified in said lightweight record, and transferring said reactive response to an animation system that controls said animated character, and playing said reactive response by said animation system. Alternatively, the invention may be implemented across networked computers, including a method having the steps of receiving an animated character request at a host computer from a remote computer, uploading an animation system and a speech manager from said host computer to said remote computer, receiving lightweight and content records from said speech manager on said remote computer, preparing fast and detailed responses based on said lightweight and content records, and uploading said fast and detailed responses to said animation system on said remote computer.
Each of the methods and processes of the invention may be embodied as a set of computer readable instructions, that, when loaded into the computer cause the computer to perform the method and/or processes of the invention.