A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
This invention relates to human-computer interfaces (HCI). The invention is more particularly related to the interaction of a user and an animated anthropomorphic character as an HCI. The invention is even more particularly related to architecture for building the anthropomorphic character. And, the invention is still further related to an integration of processing techniques and input/output (i/o) modalities for producing a conversational character interface that is intuitive to a user having only language and interaction skills.
2. Discussion of the Background
Synthetic, animated characters can be divided into two broad categories: characters that are directly controlled by a human user""s actions, and characters that perform behaviors, independent of the user""s actions, either autonomously or through pre-compiled scripts. A taxonomy of synthetic character types is illustrated in FIG. 1.
Characters controlled by a user""s actions are often called an xe2x80x9cavatarxe2x80x9d and generally serve as a representation in a virtual environment of the user who controls its behavior. These avatar characters are used in graphical chat-rooms and on-line virtual worlds such as Habitat, the Palace, BodyChat [Vilhjalmsson97], Oz Virtual, OnLive! Technologies and Worlds, Inc.
Due to the computational complexity of real-time interactions in graphical virtual environments, avatar representations tend to be graphically simplistic and insufficient for representing spontaneous gestures, facial expressions, and other non-verbal behaviors. Moreover, because the input modalities are severely restricted in online virtual worlds, generally confined to the mouse and keyboard, avatar users are forced to exercise fine-grained, conscious control over each gestural movement.
Vilhjalmsson""s BodyChat system attempts to overcome this difficulty by integrating a model of awareness and turn-taking behaviors into its avatars. For example, giving the avatars some autonomy to make decisions about where to look based on user defined parameters for conversational engagement.
Another use of directly controlled characters is for automatically generating animations based on the movements of human actors. By correlating points on an actor""s body with nodes in the graphical representation of the character, this xe2x80x9cperformance animationxe2x80x9d technique imbues the character with the ability to produce fine-grained gestures and facial expressions, and exhibit realistic gaze behaviors and body movements.
Characters that are not directly controlled by the user can be subdivided into two groups: those whose behaviors are scripted in advance, and those whose behaviors are essentially autonomous, and derived at runtime based on inputs from the user. The range of behaviors of the former type of character must be explicitly defined by the character""s creator. One advantage of pre-scripting is that the integration of verbal and non-verbal behaviors need not be calculated at runtime, thereby avoiding complicated on-the-fly planning of motor controls in the animation model.
Scripted characters, on the other hand, are limited in their ability to interact with users and react to multimodal user inputs. Examples of scripted character systems include:
Document Avatars [Bickmore97]. These characters are attached to hypertext documents, and can be scripted to perform specific behaviors when certain parts of the document (e.g. links) are selected. Document avatars can be used to provide guided tours of a document, representing a particular reader""s viewpoint. They can be scripted to speak, move around the document, point to objects and activate links.
Microsoft Agent [Microsoft97]. These characters can be scripted to speak a text string, perform specific animation sequences, hide, move and resize. The user interacts with a character by dragging it or selecting commands from a pop-up menu.
Jack Presenter [Badler97]. This system allows an anthropomorphically correct 3D animated figure to be scripted to give a presentation. The character""s author provides the narrative text which includes annotations describing where, when and what type of gestures should occur. Users simply observe the character""s pre-determined behaviors.
PPP Persona [Andre96]. This project uses a planning system to plan tutorials of specified material given a target time duration for the presentation. Presentations are not scripted by human authors, but are instead created by a planning system. Unlike Document Avatars and Microsoft Agent characters, users cannot interact with the characters during a presentation.
The second group of indirectly controlled characters are the autonomous (or semi-autonomous) characters. Work in this area can be further divided into two categories: entertainment/simulation characters, and task-based characters. The former category includes non-human character representations (e.g. The MIT Media Laboratory""s ALIVE system [Maes94], PF Magic""s Dogz, Fujitsu Interactivel""s Fin Fin, and CMU""s Oz) as well as systems for authoring anthropomorphic virtual actors (the NYU Media Research Laboratory""s Improv system [Perlin96], and Stanford""s Virtual Theater Project).
Prior task-based autonomous characters include the following systems:
Microsoft Office Characters The MS Office suite of applications includes a collection of animated characters to provide user assistance and an interface to the online documentation. These characters can respond to typed, free-form questions, and respond with text balloons containing mouse-clickable menu options.
Microsoft Persona [Microsoft97] The Persona project allows a user to control a computerized jukebox through an animated character who accepts speech input and produces spoken output with limited spontaneous gestures.
Animated Conversation [Cassell94] In this system, two animated characters, Gilbert and George, can converse with one another, using context-appropriate speech, gestures and facial expressions, to negotiate banking transactions in a virtual bank.
Ymni [Thorisson96] Ymir is an architecture for autonomous characters that display turn-taking and other interactional competencies. The user interacts with Gandalf, an animated character developed in the Ymir architecture, using natural speech and gestures to ask questions about the solar system. Of the prior art cited above, only the Gandalf/Ymir system utilizes some natural non-verbal inputs such as gesture and head position.
User interfaces, including the above described synthetic characters, often exploit common metaphors, such as the desktop, in an attempt to make computer systems easier and more intuitive to use. These metaphors, however, are not always well understood by users, particularly novices, who often require extensive training. As computing becomes ubiquitous in the modern world and as the complexity of systems increases, the need for methods of human-computer interaction that require only minimal specialized expertise and training has also increased.
Recent research [ReevesandNass97] has suggested that human interactions with computers, and indeed other forms of media, are intrinsically social in nature, that we (unconsciously) treat computers as social actors, and that the rules that govern our social interactions with other people are imputed by users to their computers. However, many of the common interaction metaphors (e.g. menus, desktops, windows, etc.) focus on the computer as a physical tool, something the user must master to use effectively.
The present inventors have realized that interface metaphors that are based on social rules of engagement, rather than physical tools, will provide great benefit to many users, allowing them to socially interact with computers and other devices, rather than require training on specific tools or programs.
Accordingly, it is an object of this invention to provide an interface based on social rules of engagement.
It is another object of this invention to provide an interface based on construction of a user ally for operation of a computer or other device.
It is another object of the present invention to integrate multiple processing techniques and multiple i/o modalities to provide a conversational character operating as a naturalistic, intuitive, and coherent interface.
It is another object of the present invention to provide a modular architecture for a conversational character that provides both raw and processed data for a central module that determines what action the conversational character takes.
It is yet another object of the present invention to provide a walk-up interface operable by a user with only basic language and interaction skills.
It is yet another object of the present invention to provide a conversational character that interacts within a virtual space of the character and a physical space of the user.
These and other objects are accomplished by an interface for a system, including, an input device configured to capture user inputs, a processing component that integrates deliberative and reactive processing performed on said user inputs, and an output mechanism for performing actions based on the deliberative and reactive processing.
Alternatively, the invention includes a multi-modal interface that captures user inputs, and a synthetic character configured to respond to said inputs. The multi-modal inputs may preferably feed a combination of deliberative and reactive processing that controls the synthetic character.
The present invention also includes a method for operating a device, including the steps of displaying a virtual space; retrieving user inputs from a user in a physical space; and combining both deliberative and reactive processing on said inputs to formulate a response.
The deliberative processing includes the steps of fusing selected portions of the user inputs into a coherent understanding of the physical environment and actions of the user; updating a discourse model reflecting current and past inputs retrieved from the user; and outputting, to the reactive processing, at least one frame describing the user inputs.
The reactive processing includes the steps of receiving asynchronous updates of selected of the user inputs and understanding frames concerning the user inputs from said deliberative processing; accessing data from a static knowledge base about a domain and a dynamic knowledge base having inferred information about a current discourse between the user, physical environment, and virtual space; and determining a current action for the virtual space based on the asynchronous updates and data.