1. Field of the Invention
The present invention relates to a game apparatus which can be operated by voice, an input device for inputting an image of a mouth or lips and/or voice, and a voice response apparatus.
2. Description of the Related Art
FIG. 34 shows an example of a conventional game apparatus. In accordance with this game apparatus, an operator uses a remote controller including a radio transmitter at his/her hand in order to operate an airship 7 including a radio receiver. As shown in FIG. 34, such a conventional game apparatus generally employs joy sticks 161 incorporated in the remote controller, with which the operator operates the desired object (airship) 7. When the operator moves the joy sticks 161, the respective angles thereof are detected by angle detection sections 162 and 163, and are converted into electric signals so as to be input to a control section 164. The control section 164 outputs a radio control signal for controlling the movements of the airship 7 in accordance with the angles of the joy sticks 161.
However, the use of the joy sticks 161, as required by conventional game apparatuses, does not allow a natural operation by humans (operators). This has presented problems in that it takes time for an operator to acquire proficiency in the operation and that quick reaction cannot always be achieved as required. In another instance of a game apparatus such that an operator operates a balloon equipped with a driving apparatus rather than an airship, the movements of the balloon are also controlled in the above-mentioned manner so that there is a problem in that the movements become non-animate or inorganic, thereby ruining the xe2x80x9chumanxe2x80x9d feel inherent to balloons.
On the other hand, there has been proposed an apparatus for recognizing the voice of an operator by inputting an image of the mouth or lips of the operator. However, such an apparatus requires sophisticated optical system lenses, thereby increasing the size and scale of the entire apparatus as well as being expensive.
The game apparatus of this invention, includes voice input means for inputting at least one voice set including voice uttered by an operator, for converting the voice set into a first electric signal, and for outputting the first electric signal; voice recognition means for recognizing the voice set on the basis of the first electric signal output from the voice input means; image input means for optically detecting a movement of lips of the operator, for converting the detected movement of lips into a second electric signal, and for outputting the second electric signal; speech period detection means for receiving the second electric signal, and for obtaining a period in which the voice is uttered by the operator on the basis of the received second electric signal; overall judgment mans for extracting the voice uttered by the operator from the input voice set, on the basis of the voice set recognized by the voice recognition means and the period obtained by the speech period detection means; and control means for controlling an object on the basis of the voice extracted by the overall judgment means.
In one embodiment of the invention, the speech period detection means includes: differentiation means for detecting a degree of change in the second electric signal output from the image input means; and means for determining, when the degree of change detected by the differentiation means exceeds a predetermined value, that corresponding voice is uttered by the operator.
In another embodiment of the invention, the overall judgment means includes: means for producing an evaluation period by adding a period having a predetermined length to the period obtained by the speech period detection means; means for detecting a recognition result output time at which the voice set recognized by the voice recognition means is output from the voice recognition means; and means for performing a comparison between the recognition result output time and the evaluation period, and for determining that voice of which the recognition result output time falls within the evaluation period in the voice set is the voice uttered by the operator.
Alternatively, the game apparatus of this invention includes: image input means for optically inputting a movement of lips of an operator, for converting the input movement of lips into an electric signal, and for outputting the electric signal; lip reading means for obtaining the movement of lips on the basis of the electric signal, for recognizing a word corresponding to the obtained movement of lips, and for outputting the recognition result; and control means for controlling an object in accordance with a control signal based on the recognition result.
In one embodiment of the invention, the lip reading means includes: storage means for storing a predetermined number of words; and matching means for selecting one word from the predetermined number of words on the basis of the obtained movement of lips, and for determining that the selected word is the word corresponding to the movement of lips.
In another embodiment of the invention, the storage means stores movements of lips corresponding to the predetermined number of words as reference patterns, and the matching means calculates distances from the obtained movement of lips for all of the reference patterns, and selects a word corresponding to a minimum one of the calculated distances for the reference patterns.
In still another embodiment of the invention, the game apparatus further includes: voice input means for inputting voice, for converting the voice into a further electric signal, and for outputting the further electric signal; voice recognition means for recognizing the voice on the basis of the further electric signal output from the voice input means; and overall judgment means for outputting the control signal to be applied to the control means based on both the recognition result by the voice recognition means and the recognition result by the lip reading means.
In still another embodiment of the invention, the game apparatus further includes: means for obtaining a degree of voice recognition reliability for the recognition result by the voice recognition means; and means for obtaining a degree of lip reading reliability for the recognition result by the lip reading means, wherein the overall judgment means selects one of the recognition result by the voice recognition means and the recognition result by the lip reading means, based on the degree of voice recognition reliability and the degree of lip reading reliability, and outputs the selected recognition result as the control signal.
In still another embodiment of the invention, the image input means includes light emitting means for emitting light, and a photodetective means for receiving the light reflected from the lips of the operator and for converting the received light into the second electric signal.
In still another embodiment of the invention, the light is radiated to the lips from the side of the lips.
In still another embodiment of the invention, the light is radiated to the lips from the front of the lips.
In still another embodiment of the invention, the voice input means includes at least one microphone.
In still another embodiment of the invention, the voice input means includes at least one microphone, and the microphone and the light emitting means and the photodetective means of the image input means are provided on a single stage.
The input device of this invention includes: a head set of a headphone type; a supporting bar having one end joined to the head set; a stage joined to the other end of the supporting bar, the stage including at least one light emitting element for generating light with which lips of an operator is irradiated and at least one photodetective element for receiving the light reflected from the lips provided thereon.
In one embodiment of the invention, the stage further includes voice input means for inputting voice provided thereon.
The voice selection apparatus of this invention includes: first memory means for storing a plurality of tables, each of the tables including a plurality of words which can be output for one input; second memory means for storing one of the plurality of tables; selection means for selecting one word from the plurality of words included in the one table stored in the second memory means in accordance with an input which is externally input, and for outputting the selected one word as voice; and change means for changing the one table stored in the second memory to another table of the plurality of tables stored in the first memory means, the table being determined depending on the selected one word.
In one embodiment of the invention, the voice selection apparatus further includes means for generating a random number, wherein the selection means selects the one word from the plurality of words by using the random number.
Alternatively, the voice selection apparatus includes: memory means for storing a table, the table including a plurality of words which can be output in response to one input; selection means for receiving an input which is externally input, for selecting one word from the plurality of words included in the table which is stored in the memory means by using a random number, and for outputting the selected one word as voice; and means for generating the random number.
The voice response apparatus of this invention includes: the voice selection apparatus mentioned above; and a voice recognition apparatus for receiving voice, for recognizing the voice, and for outputting the recognition result to the voice selection apparatus.
Alternatively, the game apparatus of this invention includes the voice response apparatus mentioned above.
Alternatively, the game apparatus of this invention includes a plurality of voice response apparatuses mentioned above, whereby the plurality of voice response apparatuses can perform conversation with each other.
Alternatively, the game apparatus of this invention includes: a plurality of voice input sections for converting input voice into an electric signal, the plurality of voice input sections respective corresponding to different directions; and direction detection means for obtaining an energy of the electric signal for each of the plurality of voice input sections, for determining one of the plurality of voice input sections having a maximum energy, and for determining a direction corresponding to the determined one voice input section as a direction from which the voice is originated.
In one embodiment of the invention, the game apparatus further includes: operation means for operating an object; and control means for controlling the operation means in order to change a direction in which the object is to be operated to the determined direction.
In another embodiment of the invention, the game apparatus further includes: direction selection means including: measurement means for measuring a present direction of the operation of the object; and means for inputting the determined direction, for obtaining a target direction based on the present direction and the determined direction, and for storing the target direction; and operation means for operating the object, wherein the direction selection means controls the operation means so that the present direction of the operation of the object substantially agrees to the target direction, by using a difference between the target direction and the present direction.
Alternatively, the game apparatus of this invention includes direction selection means, the direction selection means including: input means for inputting a relative direction by using voice; measurement means for measuring a present direction of an object; and means for obtaining a target direction based on the present direction and the input relative direction, and for storing the target direction, wherein the direction selection means controls the object so that the present direction of the object substantially agrees to the target direction, by using a difference between the target direction and the present direction.
In one embodiment of the invention, the input means includes an input section through which the voice is input, and a recognition section for recognizing the relative direction based on the input voice.
Alternatively, the game apparatus of this invention includes direction selection means, the direction selection means including: input means for inputting an absolute direction by using voice; means for determining a target direction based on the absolute direction, and for storing the target direction; and measurement means for measuring a present direction of an object, wherein the direction selection means controls the object so that the present direction of the object substantially agrees to the target direction, by using a difference between the target direction and the present direction.
In one embodiment of the invention, the input means includes an input section through which the voice is input, and a recognition means for recognizing the absolute direction based on the input voice.
The voice recognition apparatus of this invention includes: first detection means for receiving an electric signal corresponding to voice, and for detecting a voice termination point representing a time at which the input of the voice is terminated, based on the electric signal; second detection means for determining a speech period, the speech period being a period in which the voice is uttered within a whole period in which the voice is input, based on the electric signal; feature amount extracting means for producing a feature amount vector, on the basis of a part of the electric signal corresponding to the speech period; memory means for storing feature amount vectors for a plurality of voice candidates which are previously generated; and means for recognizing the input voice, by comparing the feature amount vector from the feature amount extracting means with each of the feature amount vectors of the plurality of voice candidates stored in the memory means.
In one embodiment of the invention, the first detection means includes: means for dividing the electric signal into a plurality of frames each having a predetermined length; calculation means for obtaining an energy of the electric signal for each of the plurality of frames; and determination means for determining the voice termination point based on a variance of the energy.
In another embodiment of the invention, the determination means determines the voice termination point by comparing a predetermined threshold value with the variance of the energy, and the voice termination point corresponds to a time at which the variance of the energy agrees to the threshold value while the variance of the energy is changed from a value larger than the threshold value to a value smaller than the threshold value.
In still another embodiment of the invention, the determination means uses a variance for energies of a predetermined number of frames among the energies of the plurality of frames.
In still another embodiment of the invention, the second detection means includes: means for smoothing the energy of the electric signal; first circulation memory means for sequentially storing the energy of the electric signal for each frame before smoothing; second circulation memory means for sequentially storing the smoothed energy for each frame; threshold value calculation means for calculating a speech period detecting threshold value, by using both of the energy before smoothing stored in the first circulation memory means at a time at which the voice termination point is detected and the smoothed energy stored in the second circulation memory means at a time at which the voice termination point is detected; and speech period determination means for determining the speech period by comparing the energy before smoothing with the speech period detecting threshold value.
In still another embodiment of the invention, the threshold value calculation means calculates the speech period detecting threshold value, by using a maximum value of the energy before smoothing stored in the first circulation memory means at a time at which the voice termination point is detected, and a minimum value of the smoothed energy stored in the second circulation memory means at a time at which the voice termination point has not been detected.
In still another embodiment of the invention, the feature amount detection means calculates a zero crossing number of each frame of the electric signal, a zero crossing number of each frame of a signal which is obtained by differentiating the electric signal, and the energy of the electric signal, from the speech period of the electric signal, and wherein these obtained values are used as elements of the feature amount vector.
Alternatively, the voice response apparatus of this invention includes: at least one voice recognition apparatus mentioned above; and at least one control apparatus for controlling an object based on a recognition result of the at least one voice recognition apparatus.
In one embodiment of the invention, the voice response apparatus further includes: transmission means, connected to the at least one voice recognition apparatus, for transmitting the recognition result by the at least one voice recognition apparatus; and receiving means, connected to the at least one control apparatus, for receiving the transmitted recognition result, and for applying the recognition result to the at least one control apparatus, wherein the at least one control apparatus and the receiving means are attached to the object, whereby the object can be controlled by remote control.
Thus, the invention described herein makes possible the advantages of: (1) providing a low-cost game apparatus of a simple configuration which can be operated by human voice, does not require proficiency in the operation, usable in a noisy environment or under circumstances that do not allow easy utterance by a speaker, and can be used by those who have speech impediments; (2) providing a voice recognition apparatus which allows a game apparatus or a toy to be naturally operated; and (3) providing a voice response apparatus the operation of which can be changed in response to voice input thereto.
These and other advantages of the present invention will become apparent to those skilled in the art upon reading and understanding the following detailed description with reference to the accompanying figures.