1. Field of the Invention
The invention relates to an interactive information input/output system and, in particular, to a multi modal interactive system (might be simply called a multi modal system), method, and a computer readable recording medium for use in the multi modal system.
2. Description of the Related Art
Conventionally, a multi modal system has been proposed which can get necessary information by monitoring a voice signal and a usual gesture which is given by people. This system serves as an interactive machine between an information processing apparatus and many people and is very helpful to facilitate to use the information apparatus for many people. Further, some information apparatus are often operable as personified agents to interface with users and the users can easily handle the apparatus.
Also, a pseudo emotion device is disclosed in Japanese Laid Open Publication No. H06-12401 (namely, 12401/1994). The emotion imitating device makes an agent behave in a manner similar to human beings by using a pseudo emotion model and can achieve smooth information transmission.
Further, an integrated recognition interactive device is also disclosed in Japanese Laid Open Publication No. H08-234789 (namely, 234789/1996). The integrated recognition interactive device can have a more natural conversation by collecting or gathering various information which includes time information and information selected from a plurality of channels based on information from a multi modal interactive database.
In each of the above prior multi modal interactive devices, there is a problem that the prior devices can not follow changes of a length of a pause (timing) in a conversation. As a result, the prior device undesirably provides unnatural conversations to its users.
The length of the pause may depend on each user. On the other hand, a recognition time of the prior device is constant and, as a result, the length of the pause is longer than or shorter than the recognition time of the prior device.
Also, the length of the pause generally changes according to user""s age, sex, and personality etc.
Further, the length of the pause may dynamically change based on the situation of the user, or transition of the user""s conversation.
Therefore, it is an object of the invention to provide a multi modal interactive device which can realize a natural conversation with users by considering a proper length of a pause (timing) to the users according to types of the users.
According to a first aspect of the invention, there is provided a multi modal interactive device which comprises an input unit which inputs information related to a user, a recognition unit which recognizes the information obtained by the input unit, an integrate process unit which determines an intention of the user from the recognition result from the recognition unit, a reaction generating unit which generates a reaction which corresponds to the intention of the user, a storing device which stores a timing for each user state, a conversation managing unit which determines a timing on the basis of the user state with reference to the storing device, and an output unit which outputs the reaction to the user based on the determined timing.
According to a second aspect of the invention, there is provided a method of providing with a multi modal conversation. The method comprises the steps of receiving information related to a user, recognizing the information, determining an intention of the user from the recognition result, generating a reaction which corresponds to the intention of the user, preparing a timing for each user state, determining a timing on the basis of the user state, and supplying the user with the reaction based on the determined timing.
According to a third aspect of the invention, there is provided a recording medium readable by a computer, tangibly embodying a program of instructions executable by the computer to perform a method of providing a multi modal conversation. The method comprises the steps of receiving information related to a user, recognizing the information, determining an intention of the user from the recognition result, generating a reaction which corresponds to the intention of the user, preparing a timing for each user state, determining a timing on the basis of the user state, and supplying the user with the reaction on the determined timing.