1. Field of the Invention
The present invention relates to a toy used to enjoy conversation or an embodied voice responsive toy designed to facilitate mind communication through voice.
2. Prior Art
In recent years, toys moving their arms and legs or their heads in response to voice are popular. For example, xe2x80x9cInteractive talking toyxe2x80x9d disclosed in U.S. Pat. No. 4,923,428 can be cited. These execute a specific pattern motion or a combination of plurality motions in accordance with voice, and do not produce a motion pattern as a communication motion (motion for facilitating communication to a person or enhancing intimacy). However, they make a favorable impression on a youth living alone in a city apartment building or an apartment where to have a pet of an animal or the like is not permitted, especially on a lady, and at present, many such toys are sold.
Similarly, as a toy using voice, there is a message device which records and reproduces a voice. This toy reproduces a previously recorded talker""s voice, with a motion of a robot, to facilitate mind communication. This solves a temporal distance by the voice. Such use of voice is also seen as message means in which a cassette tape recording a voice is exchanged, though it is not a toy. As compared with communication by only words, since an actual voice of a transmitter is transmitted, more smooth or intimate communication than a letter can be realized. This solves a spatial distance by voice.
The toy responding to voice has a significance as a tranquilizer for a person living alone, and the response of the toy is important. However, since such a conventional toy merely repeats a motion in proportion to the magnitude of amplitude by using the voice as a simple input, there has been a problem that it is difficult to empathize. Mind communication using voice is excellent in that both parties separated in distance or time are made not to feel distance or time and smooth or intimate communication is realized. However, in such mind communication means, a talker or listener must talk toward a robot thrashing its arms and legs, and there has been a defect that it is difficult to give his or her whole mind into voice. Then investigation has been made on means for facilitating empathy for a toy using voice, such as a toy used to enjoy conversation or a toy designed to facilitate mind communication through voice.
As a result of the investigation, there has been developed an embodied voice responsive toy which is constructed by a voice input-output portion, a voice responsive pseudo-person, and a pseudo-person control portion, the voice input-output portion serves to input voice from the outside or output voice to the outside, the pseudo-person control portion determines an action of the voice responsive pseudo-person from the voice passing through the voice input-output portion and to actuate the voice responsive pseudo-person. This embodied voice responsive toy may be constructed by adding a data input-output portion and a data conversion portion to the voice input-output portion, in which the data input-output portion serves to input data other than voice from the outside or output data other than voice to the outside, and the data conversion portion performs mutual conversion of the data other than the voice and the voice to transfer the voice to the voice input-output portion. The data input-output portion inputs and outputs data capable of synthesizing voice, which is input other than voice. Although the pseudo-person control portion determines the action of a robot from the voice, if the conversion of the data to a signal (sound) based on the voice can be made, it is not necessarily required to be able to recognize the meaning. The data conversion portion serves to perform the mutual conversion between such data and voice or sound. The voice or sound synthesized from the data is sent through the voice input-output portion to the pseudo-person control portion.
Although it is preferable that the voice responsive pseudo-person has a form imitating a human being, a personified animal or plant, other inorganic object, imaginary creature or object may be used. As described later, since the present invention produces an action to cause a human talker or listener to own the rhythm of conversation jointly, that is, a communication motion in accordance with ON/OFF of voice, as long as such action is performed, the pseudo-listener or pseudo-talker may be originally an inorganic vehicle or building, or other imaginary creature or object. Rather a deformed object, building or the like is preferable since it strengthens the side as an intimate toy. The listener control portion or talker control portion is constructed by a computer. As to a robot, a driving circuit is connected to a computer (or a dedicated processing chip, etc.) and control and driving is made. The computer constructs the voice input-output portion, the data input-output portion, and the data conversion portion in hardware or software, and it is also easy to change control specification.
Specifically, (1) the voice responsive pseudo-person is a listener robot, the pseudo-person control portion is a listener control portion, the listener robot makes an action of nodding of a head, opening and closing of a mouth, blinking of an eye, or gesturing of a body in response to the voice, and the listener control portion determines the action of the listener robot on the basis of the voice passing through the voice input-output portion and activates the listener robot.
Besides, (2) the voice responsive pseudo-person is a talker robot, the pseudo-person control portion is a talker control portion, the talker robot makes head motion, opening and closing of a mouth, blinking of an eye, or gesturing of a body in response to the voice, and the talker control portion determines the action of the talker robot on the basis of the voice passing through the voice input-output portion and activates the talker robot.
Further, (3) the voice responsive pseudo-person is a shared robot of a listener and a talker, the pseudo-person control portion is listener and talker control portions, the shared robot makes an action of nodding of a head, head motion, opening and closing of a mouth, blinking of an eye, or gesturing of a body in response to the voice, the listener control portion determines the action of the shared robot as a listener on the basis of the voice passing through the voice input-output portion and activates the shared robot, and the talker control portion determines the action of the shared robot as a talker on the basis of the voice passing through the voice input-output portion and activates the shared robot.
Even if a pseudo-listener or a pseudo-talker is displayed on a display portion by an animation or the like instead of a robot, the basic operation and effect of the present invention are not changed. As the pseudo-listener or pseudo-talker displayed on the display portion, a synthesized picture responding by using a real picture, CG (Computer Graphic) newly forming a picture or an animation can be used. In the case where a computer is used for the listener control portion or the talker control portion, the computer synthesizes the synthesized picture, CG or animation, and displays the motion picture on the display portion of the computer.
In the case where the foregoing display portion is used, specifically, (4) the voice responsive pseudo-person is a listener display portion displaying a listener, the pseudo-person control portion is a listener control portion, the listener display portion displays a pseudo-listener, which makes an action of nodding of a head, opening and closing of a mouth, blinking of an eye, or gesturing of a body in response to the voice, on the listener display portion, and the listener control portion determines the action of the pseudo-listener on the basis of the voice passing through the voice input-output portion and moves the pseudo-listener displayed on the listener display portion.
Alternatively, (5) the voice responsive pseudo-person is a talker display portion displaying a talker, the pseudo-person control portion is a talker control portion, the talker display portion displays a pseudo-talker, which makes head motion, opening and closing of a mouth, blinking of an eye, or gesturing of a body in response to a voice signal, on the talker display portion, the talker control portion determines the action of the pseudo-talker on the basis of the voice passing through the voice input-output portion and moves the pseudo-talker displayed on the talker display portion.
Alternatively, (6) the voice responsive pseudo-person is a shared display portion displaying a listener and a talker, the pseudo-person control portion is listener and talker control portions, the shared display portion displays a pseudo-talker and a pseudo-listener individually, which make an action of nodding of a head, head motion, opening and closing of a mouth, blinking of an eye, or gesturing of a body in response to a voice signal, in the same space, the listener control portion determines the action of the pseudo-listener on the basis of the voice passing through the voice input-output portion and moves the pseudo-listener displayed on the shared display portion, and the talker control portion determines the action of the pseudo-talker on the basis of the voice passing through the voice input-output portion and moves the pseudo-talker displayed on the shared display portion.
In the case where the present invention is utilized as a toy used to enjoy conversation, voices are directly exchanged through a microphone or speaker from the voice input-output portion. In the case where it is used as a toy designed to facilitate mind communication, a voice is recorded on a recording medium by a separately provided voice recording or reproducing portion and is sent to the other party, and is reproduced. In the case where data is made the base, the data can be recorded in a data recording or reproducing portion, or can be reproduced. Although the recording medium may be constructed integrally with the voice input-output portion or data input-output portion, when an external storage device is additionally used as the recording medium, longer voice or data can be processed. As the external storage device, various magnetic tapes (including a cassette tape), magnetic disks, magneto-optical disks, or various media using memories can be used. Although most of the external storage devices can erase recorded contents and can be again used, in the case where it does not matter if mind communication is only performed once, a CD-ROM, CD-R, DVD-ROM or record can also be used.
Important actions of the voice responsive pseudo-person are different according to whether the voice responsive pseudo-person is a talker or a listener. (a) The action (communication motion) of the voice responsive pseudo-person as the listener is made of a selective combination of nodding of a head, blinking of an eye, and gesturing of a body. The nodding is executed at a nodding timing when the prediction value of nodding presumed from ON/OFF of the voice exceeds a nodding threshold, the blinking is executed at a blinking timing which is exponentially distributed with the passage of time from the nodding timing as a starting point, and the gesturing of the body is executed at a gesturing timing when the prediction value of nodding presumed from ON/OFF of the voice exceeds a gesturing threshold.
Besides, (b) the action (communication motion) of the voice responsive pseudo-person as a talker is made of a selective combination of head motion, opening and closing of a mouth, blinking of an eye, and gesturing of a body. The head motion is executed at a head motion timing when the prediction value of head motion presumed from ON/OFF of the voice exceeds the threshold of head motion, the blinking is executed at a blinking timing when the prediction value of blinking presumed from ON/OFF of the voice exceeds a blinking threshold, and the gesturing of the body is executed at a gesturing timing when the prediction value of head motion or the prediction value of gesturing presumed from ON/OFF of the voice exceeds a gesturing threshold.
The action (communication motion) determined in this manner produces the rhythm of conversation between the pseudo-listener and the talker (or pseudo-talker and listener), and causes embodied entrainment (also called merely entrainment). This entrainment produces an atmosphere where a person can talk or listen with ease, and causes empathy with the pseudo-listener or pseudo-talker played by the robot, the animation on the display portion, or the like.
The combination of the actions is free. For example, the pseudo-talker uses the head motion instead of the nodding, and the pseudo-listener does not use basically the opening and closing of the mouth. With respect to the gesturing of the body, in the algorithm to obtain the nodding timing, the gesturing threshold with a value lower than the nodding threshold is used to obtain the gesturing timing. In the gesturing, movable portions are moved in accordance with the change of the voice, the movable portions of the body are selected in response to the voice, or a predetermined motion pattern (combination of the movable portions and the motion amounts of the respective portions) is selected. The selection of the movable portions or motion patterns in the gesturing makes the cooperation of the nodding and the gesturing natural. Like this, in the present invention, except for the opening and closing of the mouth and the motions of the respective portions of the body on the basis of the amplitude of the voice, the communication motion is realized mainly through the nodding timing in the pseudo-listener and mainly through the head motion in the pseudo-talker.
The important nodding timing is determined by algorithm to compare the nodding threshold, which is obtained from a prediction model obtained by combining the voice to the nodding linearly or nonlinearly, for example, aMA (Moving-Average Model) or neutral network model, with the predetermined nodding threshold. In the present invention, in the case of the pseudo-listener, the prediction model relating the voice to the nodding is used, and in the case of the pseudo-talker, the prediction model relating the voice to the head motion is used. In this algorithm, the voice is grasped as ON/OFF of an electric signal with the passage of time, the prediction value of nodding (in the case of the talker, prediction value of head motion) obtained from the ON/OFF of the electric signal with the passage of time is compared with the nodding threshold (in the case of the talker, threshold of head motion) or the gesturing threshold, and the nodding timing or the gesturing timing is derived. Since the simple ON/OFF of the electric signal is made the basis, a calculation amount is small, and even if a CPU with low performance is used for determination of real-time actions, promptness is not lost. The present invention is characterized in that the entrainment is caused from ON/OFF when the voice is regarded as an electric signal. Further, in addition to the ON/OFF, the cadence or intonation indicating the change of the electric signal with the passage of time may also be taken into consideration together.