1. Field of the Invention
The present invention relates to devices that simulate personal interaction with a user through various outputs modalities such as light pulsations, synthetic speech, computer generated animations, sound, etc. to create the impression of a human presence with attending mood, ability to converse, personality, etc.
2. Background
With increasing sophistication in technology, the variety of possible features and options associated with many appliances can be daunting. This phenomenon is exemplified by satellite and cable TV where the number of program choices is unwieldy in some cases. Many examples exist, including cell phones, personal computer applications, e-trading systems, etc. In such environments it is useful for the machines to take some of the routine work out of making choices from among an overwhelming number of options. However, often, the solutions are not much less painful than the problems they are supposed to address. For example, user interfaces that filter a large number of choices using custom templates for each user must be trained as to the user""s preferences. For example, a user can enter his/her preferences by actively classifying his/her likes and dislikes (xe2x80x9ccustomizationxe2x80x9d). This can also be done passively such as by having a computer process xe2x80x9cobservexe2x80x9d the selections made by the user over time (xe2x80x9cpersonalizationxe2x80x9d). Such systems are discussed in a variety of patent applications assigned to Gemstar and Philips Electronics. For example, U.S. Pat. No. 5,515,173 for System And Method For Automatically Recording Television Programs In Television Systems With Tuners External To Video Recorders; U.S. Pat. No. 5,673,089 for Apparatus And Method For Channel Scanning By Theme; U.S. Pat. No. 5,949,471 Apparatus And Method For Improved Parental Control Of Television Use. Another example is U.S. Pat. No. 5,223,924.
The user-interfaces that permit the specification of preferences, either explicitly or passively, are often sophisticated enough to be fun and intuitive. More and more such systems have evolved toward seemingly xe2x80x9csmartxe2x80x9d systems that try to seem like human helpers rather than control panels. For example, help dialogs in complex software applications such as Microsoft(copyright) Office(copyright) accept natural language sentences and give text responses quasi-synchronously with animated characters. Another example of an interface that accepts natural language questions is AskJeeves(copyright), a search engine for the Internet.
User interfaces are evolving rapidly away from function oriented systems where a sequence of steps are performed, the last step of which results in the desired result, to object oriented interfaces which allow the user to manipulate the subject matter of concern providing immediate feedback on the results. The ultimate embodiment of this idea is noncommand interfaces such as virtual reality environments that permit a user to manipulate controls and experience the effects rather than performing a prescribed series of steps to achieve a result. Agents are another type of noncommand interface. The eventual goal of some research is to provide highly intelligent agents that know all about the user""s needs and lifestyle and can retrieve exactly the information required of the user at any given time. Practical embodiments of such noncommand interfaces are, however, not quite ready for introduction into practical products. One of the problems is that the software technology required is highly sophisticated and requires a great deal of programming and/or training. For now, computers still need to be told what to do and the dream of interfaces that are truly easy to use, those giving the impression of being xe2x80x9csmart,xe2x80x9d are for most applications, still rather far off.
Computers that look smart without actually being smart, however, have been around for a while. A very different area of computer software development began in the mid-1960s with an MIT computer scientist Joseph Weizenbaum who created a simple program with no utility apart from how smart it appeared to be. It simulated the role of a Rogerian psychotherapist by attempting to draw the user out and encouraging the user to continue talking. This program was called ELIZA. Many users found ELIZA very convincing. Even Weizenbaum""s secretary demanded to be left alone with the program. However, ELIZA""s conversation is effective only as long as one is willing to accept the very passive conversational style of a Rogerian psychotherapist, in which either the xe2x80x9ctherapistxe2x80x9d usually responds by repeating part of what the xe2x80x9cpatientxe2x80x9d has said or by asking the xe2x80x9cpatientxe2x80x9d to explain or elaborate on what he has just said. What makes ELIZA seem convincing is that it does not do this in a rote manner. If the user mentions his mother, for example, ELIZA might ask the user to tell xe2x80x9cherxe2x80x9d more about his family.
Below is a sample conversation reproduced from ELIZA:
User: Men are all alike.
ELIZA: In what way?
User: They""re always bugging me about something or other.
ELIZA: Can you think of a specific example.
User: Well, my boyfriend made me come here.
ELIZA: Your boyfriend made you come here.
User: He says I""m depressed much of the time.
ELIZA: I""m sorry to hear you are depressed.
User: It""s true. I am unhappy.
ELIZA: Do you think coming here will help you not to be unhappy?
User: I need some help, that much seems certain.
ELIZA: What would it mean to you if you got some help?
User: Perhaps I could get along with my mother.
ELIZA: Tell me more about your family.
User: My mother takes care of me.
ELIZA: Who else in your family takes care of you.
User: My father.
ELIZA: Your father.
User: You are like my father in some ways.
ELIZA: What resemblance do you see?
So far, virtually all these so-called conversation simulators have used a basic method that generates shallow responses based on template-matches (rules) without any real understanding. For example, the template: xe2x80x9cI wish I were  less than x greater than xe2x80x9d (where  less than x greater than represents any series of words) matches on the user statement: xe2x80x9cI wish I were taller.xe2x80x9d The template is associated with the machine response: xe2x80x9cWhy do you wish you were  less than x greater than ?xe2x80x9d The machine responds with the statement: xe2x80x9cWhy do you wish you were taller?xe2x80x9d What distinguishes one conversation simulator from another is not so much its sophistication and complexity as the size and variability of its database of responses. More sophisticated variants have a larger database of templates and responses including whimsical responses that can make them more interesting than the passive, flat responses of ELIZA.
Some conversation simulators provide information on specific topics, rather than general conversation simulation. For example, conversational simulators have been used for providing information regarding a particular topic. Basically, their libraries of responses anticipate questions about some subject and provide xe2x80x9ccannedxe2x80x9d responses. Some conversation simulators have been programmed to appear as if they had a life story to relate. They would talk about their story when they could not come up with a good template match to keep the conversation going.
A typical conversation simulator may be described as having two parts: a user-interface shell and a database. The user-interface is a computer program that remains essentially constant irrespective of which personality or information database is used. The database is what gives the conversation simulator its personality, knowledge, etc. It contains the specific answers and information about questions for a topic. The database has pre-defined answers linked together by question templates. The realisticness of the conversation simulator depends on how well the creator of the database has anticipated the questions people are likely to ask and the patterns that are common to classes of questions with the same answer. The user-interface accepts questions from a person, searches through the templates and returns the (or a random of the) most appropriate answer (or answers) corresponding to it. The technology requires the author to create the typical database; there is no initial knowledge about natural language in the user-interface and the systems cannot learn on their own. The systems are not perfect and give gibberish or simply bail out when good matches cannot be found. But this is tolerable. In principle, a perfect database would work for every conceivable situation, but if 80 per cent of questions are handled adequately, this appears to be enough to keep people interested.
Another approach to making conversation-capable machines employs more sophisticated xe2x80x9csmartxe2x80x9d technology, but as discussed above, these require too much complexity and/or training to be of use as a basis for a conversation simulator. Attempts, such as Mega Hal give the impression of actually being nonsensical. But the smart technology has its uses. An area of research called xe2x80x9ccomputational linguistics,xe2x80x9d a branch of artificial intelligence attempts to develop an algorithmic description or grammar of language. This technology can be used to parse sentences and do things like identify the most important words in a sentence or identify the direct object and verb, and things like that. In fact, the research goes much further. Computational linguists are very interested in the technology required to make computers really understand what a person is saying: lexical and compositional semantics. This is the determination from speech (written or spoken), the meaning of words in isolation and from their use in narrow and broad contexts. However, programming a computer to distinguish an ambiguous meaning of a word is far short of what is required to make a computer subsequently respond appropriately, at least a verbal response.
The technology used successfully in conversation simulators typically works by matching the user""s input against its database of templates. They choose a predefined template that xe2x80x9cbestxe2x80x9d matches a user""s statement and produce one of the template""s associated responses. To describe this mechanism in more detail, it helps to use a specific example. For this purpose we will use Splotch, a program created by Duane Fields at Carnegie Mellon University, and whose source code is publicly available from CMU""s web site. xe2x80x9cSplotchxe2x80x9d is a variation of xe2x80x9cSpotxe2x80x9d, so named because it is sort of pet like, i.e., an ill-defined spot.
Splotch, like other such programs, works by template-matching. The user""s input is compared with a database of templates. Among those templates that match, the highest ranking template is chosen, and then one of the template""s associated responses is chosen as output. The templates can be single words, combinations of words, or phrases.
A single template can include alternate words or phrases. For example the xe2x80x9cmoneyxe2x80x9d template can also match on the word xe2x80x9ccashxe2x80x9d. There is one other way that alternatives can be specified: a synonym dictionary. Before the user""s input is matched against Splotch""s templates, the words and phrases in the input are converted into canonical form. This is done by comparing them to words and phrases in the synonym dictionary and substituting the preferred form for all variants. Many of these variants will be alternative spellings, including misspellings. For example, xe2x80x9ckoolxe2x80x9d in converted to xe2x80x9ccoolxe2x80x9d and xe2x80x9cgottaxe2x80x9d to xe2x80x9cgot toxe2x80x9d. This enables a single template to match many alternative, but equivalent, words or phrases, without specifying these alternatives for each template.
Words or phrases in templates can be marked for necessary inclusion or exclusion. If a word or phrase is matched for exclusion, then there is no match on this particular template when this word or phrase is present. For example, Splotch would not match on the xe2x80x9cbusinessxe2x80x9d template, if the phrase xe2x80x9cnone of yourxe2x80x9d was marked as having to be absent by being preceded by xe2x80x9c!xe2x80x9d, e.g., xe2x80x9cbusiness:!none of yourxe2x80x9d. On the other hand, when a word or phrase is marked for necessary inclusion, then a match fails if the specified word or phrase is absent. For example, the xe2x80x9cgender:sex:andwhatxe2x80x9d template will successfully match if the user""s input includes either the word xe2x80x9cgenderxe2x80x9d or xe2x80x9csexxe2x80x9d, but only if it also includes the word xe2x80x9cwhatxe2x80x9d.
Furthermore, a template can have a variable. For example, the xe2x80x9cDo you like  less than x greater than xe2x80x9d template has a variable as its fourth term. The variable can be passed on to the response, e.g., xe2x80x9cNo, I don""t like  less than x greater than xe2x80x9d. In this case all the words after xe2x80x9cDo you likexe2x80x9d would be bound to the variable. In the template, xe2x80x9cMen are  less than x greater than  than womenxe2x80x9d, words between xe2x80x9carexe2x80x9d and xe2x80x9cthanxe2x80x9d would be bound to the variable.
Each template has an implementer-assigned rating. After Splotch has tried matching the user""s response to all its templates, it chooses the template with the highest rating, and then responds with one of the responses listed with the template. The next time this same template is chosen, it will choose a different response until it has cycled through all listed responses.
Besides variables passed from the template, responses can have another type of xe2x80x9cvariablexe2x80x9d. These indicate place holders which point to alternative words or phrases. For example, the response, xe2x80x9cMy favorite color is @color.wxe2x80x9d, indicates that the color is to be chosen randomly from a file, color.w, containing a list of color words. This allows a response associated with a template to be, in effect, many alternative responses. The phrases in the xe2x80x9c@xe2x80x9d files can themselves contain pointers to other xe2x80x9c@xe2x80x9d files.
Prior art conversation simulators tend to be repetitive unless they contain a very large number of installed template files. The large number of template files can be unwieldy. In addition, even with a large number of alternative templates, a conversation simulator
remains static. For example, real people know that the USSR has been dissolved and no longer holds the romantic intrigue it once did in spy movies. A conversation simulator programmed much before 1989 would contain many templates that would produce responses that sounded odd if they came from a person.
Most prior art conversation simulators perform poorly in simulating a personality, if they do so at all. Hutchens"" HeX, for example, was successful because it had a sarcastic, insulting personality. Certainly, prior art conversation simulators lack the appearance of a personality with any depth. A conversation simulator cannot simulate sharing in the way that people do in trusting relationships because they have no history and no experience to share; in addition to lacking the appearance of a personality, they generally lack the appearance of an identity as well.
Conversation simulators are often designed to encourage users to talk. Certainly that was the idea behind ELIZA, the progenitor of this class of program. But the tricks used to get users to talk can quickly become tiresome and predictable. One device for making conversation simulators interesting is to design the conversation simulator so that it provides factual or entertaining information. Since conversation simulators can""t understand the semantics of user""s queries, any attempt to respond to factual questions or declarations will often lead to inappropriate replies. Furthermore, a conversationalist that simply cites facts is soon perceived as a know-it-all and a bore. The most convincing conversation simulators encourage the user to talk and to respond more on an emotional than a factual level, expressing opinions and reacting to (e.g., supporting) the opinions and values of the user. This is not to say that the conversation simulator cannot be content-free while being convincing. Hutchens did a fairly adequate job in providing HeX with the sorts of information usually found in so-called small talk.
Another problem with conversation simulators is that they are easily thrown off the current subject by brief replies from the user. They do not have a sense of context and it is difficult to create a simulation of a sense of context. One solution is to provide some persistence mechanism by bringing up an old topic raised by the user using a template that requests a response from the user on that subject, for example, a question about topic  less than x greater than . But some conversation simulators that are claimed to be context sensitive will stick with a subject even if the user wants to change the subject.
Machine-learning schemes, in which new conversational content is learned from past or sample conversations, are unlikely to be successful. Such approaches generally produce novel responses, but these responses are usually nonsensical. The problem emanates in part from the fact that these techniques attempt to employ a large number of inputs to select from among a large number of outputs with a concomitant need for tremendous training and tolerance of unpredictability in the results.
Even for conversation simulators that are highly convincing, in the long run, they are essentially entertainment; a dissipative activity. Upon learning what they do, many people ask why someone would bother to spend time with a conversation simulator. Many who are initially intrigued end up bored, so even the entertainment value of conversation simulators is limited. Except for using the information gathered in a chat for filling in the blanks of response templates or, when computational linguistic approaches are used perhaps new phrase structures or ideas, all the data delivered by a user to a conversation simulator ends up going down the drain. Thus, all that data simply leads to more chat, but no new knowledge accrues and none is put to use. This adds to the basic view of conversation simulators as being interesting experiments, with very little practical justification.
Another problem with conversation simulators is that using them is not a very spontaneous and natural act. Currently there are no conversation simulators whose actions evidence a great deal of common sense, for example, that will know when to invite a user to engage in a session or when to stop, pause, or change the subject. Even if a conversation simulator had something particularly useful to say, there are no known strategies, proposals, or even the recognition of a need for providing a conversation simulator with such abilities.
An area of research that has generated technology that may be employed in computer programs generally is, so called, xe2x80x9caffective computing.xe2x80x9d This is the use of computers to be responsive to human emotions and personality to create better user interfaces. For example, U.S. Pat. No. 5,987,415, describes a system in which a network model of a user""s emotional state and personality are inferred and the inference used to select from among various alternative paraphrases that may be generated by an application. The approach is inspired by trouble-shooting systems in which a user attempts to obtain information about a problem, such as a computer glitch, using a machine-based system that asks questions to help the user diagnose and solve the problem himself. The approach can be summarized as follows. First, the system determines a mood of a user based on a network model that links alternative paraphrases of an expected expression. The mood and personality are correlated with a desired mood and personality of the engine that generates the feedback to the user. Mood descriptors are used to infer the mood of the user and the correlation process results in mood descriptors being generated and used to select from among alternative paraphrases of the appropriate substantive response. So, if there are two possible paraphrases of the substantive response by the computer (say, xe2x80x9cGive it up!xe2x80x9d or xe2x80x9cSorry, I cannot help you!xe2x80x9d), the application will select the one that best corresponds to the mood and personality the programmer has determined to be desirable for the computer to project given the user""s mood/personality. In summary there is a stochastic model used to determine the mood and personality projected by the user""s response, then a model is used to link the user""s mood and personality to a desired mood and personality to be projected by the computer. Finally, the paraphrase of the response that best matches the desired mood and personality is selected and used to generate the response using the same stochastic model in reverse.
The above user interface separates mood and personality from content. Also, stochastic models are notoriously difficult to train. Conversation simulators in the past have enjoyed great power and success in using rule-based systems.
Another technical approach for communicating the user""s attitude to a computer is a manually-settable user-interface. The user may explicitly indicate his/her attitude, for example, by moving a cursor over a graphical image of a face to change a sad face into a happy face. This approach for creating a user interface is described in U.S. Pat. No. 5,977,968. The range of feelings that may be conveyed using such an interface, however is limited and it is difficult and unnatural to convey one""s feelings in this way.
Another application area in which the user""s emotional state may be determined by a computer is medical diagnosis. For example, U.S. Pat. No. 5,617,855 describes a system that classifies characteristics of the face and voice along with electroencephalogram and other diagnostic data to help make diagnoses. The device is aimed at the fields of psychiatry and neurology.
In still another application area, machines automatically detect a user""s presence or specific features of the user for purposes of machine-authorization and authentication or convenience. To that end, some prior art systems employ biometric sensing, proximity detectors, radio frequency identification tags, or other devices.
Another system that inputs the user""s emotional state is described in JP10214024 where a device generates scenes based on a video input. Information relating to the emotional state of the user is input from the user by a recognition system and used to control the development of a story.
An interaction simulator, is like a conversation simulator, but with a broader range of possible inputs and outputs. It is possible for people and machines to express themselves in ways other than by speaking. For example, a person can use gestures, remote controls, eye movement, sound (clapping), etc. Machines can flash lights, create computer generated animations, animate mechanical devices, etc. An interaction simulator is a more general term that encompasses the entire range of inputs and outputs that could be used to create expressive interaction between a user and a machine. Briefly, the invention is an interaction simulator that provides greater ease of use than prior art conversation simulators, enhances the quality of the interaction between user and the simulator, and increases the utility derived from interaction with the simulator. The invention also provides these advantages to the field of user interfaces for data storage and retrieval. To this end, the present invention is built around an interaction simulator that is responsive to the uniqueness of each individual""s personality by automatically adapting itself to a particular user. In addition, a system and method employed by the interaction simulator provide a mechanism whereby simulator-initiated interaction is responsive to the user""s situation, for example, a conversation simulator embodiment may cease talking, to avoid interrupting the user""s monologue and stop talking if the user falls asleep. Further, the utility of the interaction simulator is extended by passively funneling useful information gleaned from conversations with a user into systems that can take advantage of the information. For example, an electronic program guide preference database can be augmented by extracting likes and dislikes from dialogues and applying them to the database. Such data may be elicited from the user responsively to the needs of the database. Still further, the interaction simulator model is extended to a range of input and output modalities. For example, a television with audio output and input capability may generate artificial speech with synchronized light or color in the cabinet of the television or a synchronized animation on the screen to attend the chat to provide the impression of a television that talks. The user""s expression can be input to the interaction simulator by means of gestures, sound, body position, manual controls, etc. Still further the substantive content of the interaction simulator""s output is enhanced by providing an ability to obtain information from regularly-updated data sources or live data feeds. The extraction of such information may be guided by data gleaned by the simulator from conversations and/or other interaction.