Conventional voice interactive apparatuses have a main objective of performing a single task, such as “booking passage by air by telephone”.
The conventional voice interactive apparatus talks with a user following faithfully a single scenario to perform a task. If a voice not present in the scenario is input from the user, the conventional voice interactive apparatus outputs a response message such as “I don't understand it” or “Please reply in yes or no”, and then waits on standby until a next voice is input.
When an object such as a robot or an image object (such as a character in a computer game) in a virtual world talks with a user, such an object needs to converse with the user in a natural and human-like manner, for example, by chatting with the user for no particular purposes, in addition to a conversation required for the above-mentioned simple task. The chat may include returning a reply containing a joke.
Even if the conventional voice interactive apparatus is mounted on the abject, the object is unable to converse with the user in a natural and human-like manner.
To cause the conventional voice interactive apparatus to perform a process to talk interactively in a natural and human-like manner is not an effective method because of the following three reasons.
As a first reason, “the conventional voice interactive apparatus is constructed of a single scenario described using a state transition model and it is difficult to equip the model with a process for a natural and human-like conversation”.
The state transition model is a model equivalent to a so-called automaton which is expressed in a description such as a description “if B is input in a state A, C is output and the state A is shifted to a state D”.
To allow the state transition model to perform a natural and human-like conversation such as a chat and joke in addition to a purpose-oriented conversation, a designer must predict all events that could occur in all states, and describe rules response to the events in the model. It is extremely difficult and practically impossible for the designer to complete these jobs.
As a second reason, “a plurality of types or a plurality of features of response sentences must be output to perform a natural and human-like conversation, there are a plurality of algorithms to generate these response sentences depending on the type and feature of the generated sentence, and it is difficult to organize these algorithms in a single scenario”
For example, a simple response sentence generating algorithm such as “Eliza” (“Language Engineering”, pages 226-228, authored by Makoto NAGAO and published SHOKODO) is an algorithm in which a rule defining an output response sentence in response to an input character string is predetermined, and an output response sentence is generated through a matching process with an input sentence.
In joke sentence generating algorithms such as “B CLASS MACHINE” (“AERA” 2001.2.12 issue, page 80, published Asahi Shinbun Sha), or “BOKE” (article “BOKE: A Japanese punning riddle generator” authored by Kim Binstead and Osamu Takizawa), a predetermined character string processing algorithm is run based on a key word, and a joke is generated by modifying an input sentence.
It is difficult to organize completely different sentence generating algorithms in a single scenario.
As a third reason, “even if it is possible to implement a process for a natural and human-like conversation in a single scenario, the implementation method is not an efficient method from system flexibility if the implementation method is considered as a software development method”.