1. Field of the Invention
The present invention relates to a human image dialogue device and a recording medium that records a human dialogue program that automatically generates the output of the movements, voice, and the words of the human image according to text and dialogue flow output from a module that controls the dialogue in a system having a character such as a human image (hereinbelow, referred to as a xe2x80x9chuman imagexe2x80x9d) appear on a computer, and carries out a dialogue with the user of the computer with this human image.
2. Description of the Related Art
Conventionally, the technologies disclosed in Japanese Patent Application, unexamined First Publication, No. Hei 9-274666, xe2x80x9cHuman Image Synthesizing Devicexe2x80x9d (hereinbelow, referred to as Citation 1); Japanese Patent Application, unexamined First Publication, No. Hei 9-16800, xe2x80x9cVoice Dialogue System with Facial Imagexe2x80x9d (hereinbelow, referred to as Citation 2); Japanese Patent Application, unexamined First Publication, No. Hei 7-334507,xe2x80x9cHuman Movement and Voice Generation System from Textxe2x80x9d (hereinbelow, referred to as Citation 3); and Japanese Patent Application, unexamined First Publication, No. Hei 9-153145, xe2x80x9cAgent Display Devicexe2x80x9d (hereinbelow, referred to as Citation 4), are known technologies.
First, in Citation 1, a system is proposed wherein a human mouth shape is generated from the frequency component of voice data, and a nodding movement is generated from the silent intervals in the voice data, and thereby an image of a human talking is displayed.
In addition, in Citation 2, discloses a voice recognition dictionary with spoken keywords having an expression code, and proposes a system wherein a response with a face image exhibiting feelings is returned as a result of the voice input of the user.
In addition, in Citation 3, a system is proposed wherein a spoken text written in a natural language is analyzed, the verbs and adverbs are extracted, the body movement pattern corresponding to the verb is determined, and the degree of motion of the movements is determined using the modifiers.
Furthermore, in Citation 4, an agent display device is proposed wherein, when activated, the rules of movement of a human-shaped agent are described by If-Then rules, so that the agent appears, gives a greeting, etc.
The first problem of the above-described conventional technology is that the description of the movements of the displayed human image is complex, and as a result great labor must be expended during the dialogue system construction. The reason for this is that, in Citation 4, for example, the movements of the agent must be described by Ifxe2x80x94Then rules, and for each dialogue system, it is necessary to describe the state of the system and the movements of the agent, which are the conditions, in detail, and this is complex.
The second problem is that expressions and movements in which the actions of the characters do not match the situation of the dialogue are generated, and movements and expressions are always repeated in the same manner. The reason for this is that in systems wherein expression and movement are synthesized from voice information and spoken text, such as is the case in Citation 1, Citation 2, and Citation 3, the same movements and expressions are generated for the same words no matter what the state of the dialogue because the expressions and movements are automatically generated from natural language, and thus the state of the dialogue does not match, and fixed movements are repeated.
In consideration of the above-described problems in the conventional technology, it is an object of the present invention to provide a human image dialogue device and a recording medium recording a human image dialogue program wherein generalized generation of gestures, expressions, etc., can be carried out in order to generate a human image on a computer that can carry out a dialogue similar to that between humans, without the expending of a large amount of labor during the construction of the dialogue system.
The human image dialogue device of the present invention comprises a dialogue control unit (2 in FIG. 1) that prompts the responses between the user and system by using a dialogue flow that describes a flow that associates the words for the system response (hereinbelow, referred to as the xe2x80x9cspoken textxe2x80x9d) and the state of the dialogue between the user and the system in this dialogue text, and a human image generation unit (5 in FIG. 1) that generates the motions, expression, conversation balloons of the words in the spoken text, and voice of the human image automatically from the spoken text written in this dialogue flow and the state of the dialogue.
More specifically, the spoken text responding to the input of the user and the state of the dialogue are recorded in the dialogue flow memory (3 in FIG. 1), and the dialogue flow is analyzed in the dialogue flow analysis unit (4 in FIG. 1).
Next, in the movement-expression generation unit (51 in FIG. 1), based on the results of the analysis of the dialogue flow in the dialogue analysis unit, the movements of the human image are generated referring to one or both of the text-movement associating memory unit (52 in FIG. 1), which associates keywords and movement patterns of this human image (FIG. 5) and the movement data memory unit (52 in FIG. 1), which describes the movement patterns and the content of the movements associated with this movement pattern (FIG. 4). The generation of the movement of this movement-expression generation unit selects a predetermined movement pattern according to the state of the dialogue written in the dialogue flow and determines the movement to be generated by the keywords included in the dialogue text.
In addition, depending on the state of the dialogue in the dialogue flow, the text output control unit (54 in FIG. 1), for example, displays a xe2x80x9cconversation balloonxe2x80x9d whose display starts when the human image on the screen starts speaking and closes when the conversation ends, or displays a xe2x80x9cmessage boardxe2x80x9d whose display starts at the same time the human image starts to speak but does not close even after the conversation has finished, etc., switches the display format, and displays the words included in the spoken text.
Furthermore, the invention can be constructed so that by the voice synthesis unit (55 in FIG. 1), spoken text can be output by voice synthesis, and by the synchronization unit (56 in FIG. 1), the output of the movement-expression generation unit, the text output control unit, and the voice synthesis unit will be synchronous.
Thus, it is possible to generate the motions and expressions which match the state of the dialogue without describing the behavior of the human image in detail because the movements of the human image are generated according to the dialogue flow, and thus the first problem of expending great labor during the construction of the system is solved. In addition, because the different movements are selected and movements modified depending on the state of the dialogue written in the dialogue flow and the number of repetitions of the dialogue flow, the second problem of generating expressions and movements of the character that do not match the state of the dialogue and always repeating the same movements and the expressions is solved.