Hitherto, a variety of ideas have been proposed and developed on a walking robot device or a multi-articulated robot, which operates in response to commands from users and/or ambient environments, or on animation characters, etc. operated based on computer graphics (CG). Those robot devices or animation characters (hereinafter referred to also as a “robot device or the like” collectively) perform a series of actions in response to commands issued from users.
For example, a robot device having the shape and structure designed in imitation of a four-footed walking animal such as a dog, i.e., the so-called “pet robot”, lies sprawled in response to a command (e.g., voice input) “Lie down!” from a user, or raises its paw to “shake hands” when the user puts a hand in front of the robot mouth.
Such a robot device or the like designed in imitation of actual animals, including a dog and a human being, are desirably capable of behaving in a manner as close as possible to actions and emotional expressions of genuine animals. It is also desired that the robot device or the like is able to not only perform just predetermined actions in response to commands from users and/or external environments, but also behave autonomously like genuine animals. This is because users get tired easily if the robot device or the like repeatedly performs the same actions far away from actual situations, and the ultimate purpose of development of the robot device or the like, i.e., cohabitation with human beings under the same living environments, cannot be achieved.
Recent intelligent robot devices or the likes incorporate therein the functions of voice input/output, voice recognition, voice synthesis, etc., and are able to converse or dialogue with users on the voice basis. In the case of performing the conversation or utterance, it is similarly desired that the robot devices or the likes are able to not only perform just predetermined actions in response to commands from users and/or external environments, but also behave autonomously like genuine animals.
Prior conversation utterance systems are proposed in, e.g., Japanese Unexamined Patent Application Publication Nos. 10-247194, 8-339446 and 9-16800.
Japanese Unexamined Patent Application Publication No. 10-247194 discloses an automatic interpreting device in which translation is performed and voices are synthesized while holding consistency as the whole of a sentence by searching for an appropriate example regarding a difference in the sentence, e.g., an erroneous utterance intent of a translated part. Also, Japanese Unexamined Patent Application Publication No. 8-339446 discloses a dialogue device in which a variety of emotions of a user are detected and information corresponding to the detected emotion is issued from the computer side so that the user can feel friendlier toward the device. Further, Japanese Unexamined Patent Application Publication No. 9-16800 discloses a voice dialogue system with a face image, which is easily adaptable for changes in speech theme and is capable of providing practical and natural dialogues with users.
However, those prior conversation utterance systems are basically intended to recognize voices of speakers or users, to extract emotions from facial expressions, and to create sentences in match with the emotions of the speakers only along the topics presented from the speakers.
Also, the voice dialogue system with a face image, disclosed in Japanese Unexamined Patent Application Publication No. 9-16800, is a system in which contents of replies corresponding to utterances are defined in the form of a table beforehand. This is hence no more than that replies and corresponding emotions are decided beforehand, although the contents of replies include emotional information.
An intelligent robot device or the like has its internal statuses including emotions, etc. and is able to realize communication with users at a deeper level by outputting the internal statuses to the outside.
In conventional robot devices or the likes, however, means for expressing internal statuses are restricted to only actions of four legs, etc., and expressions cannot be easily recognized by everyone at a glance.