1. Field of the Invention
The present invention relates to an interactive robot capable of speech recognition, and a speech recognition method and a speech recognition program for the interactive robot.
2. Description of the Related Art
Conventionally, a robot that vocally communicates with a human being has been known. A speech recognition function, by which voices of a human being can be accurately recognized, is required for such an interactive robot.
In order to improve the speech recognition function, a technology which improves a signal-to-noise ratio has been developed. For example, a small number (usually, two) of microphones are used, and, moreover, two main and sub beam formers are also used. Further, there has been known a method (refer to, for example, Japanese Patent Application Laid-Open No. 2001-100800) in which noise components in directions except the target direction are estimated with the beam former, and the noise components are subtracted from the voices with voices in the target direction, which have been obtained through main beam former as a principal component. Thereby, the noises in the directions other than the target one can be positively suppressed.
According to the above method, the target voices are separated from the noises by signal processing, noting the difference in the incidence direction between the target voice and the noises. Accordingly, when a target sound source and a noise source are almost in the same direction as each other, it is basically impossible to separate the target voices from the noises. Unless the distance of the target sound source from a robot is enough smaller than that of the noise source, there is left a large influence of the noise.
The above problem is deeply related with a physical phenomenon that, when the strengths of voices (compression waves extending in ripples) produced by corresponding sound sources are the same as each other, the strengths of the voices which have reached a microphone vary inversely with the square of the propagation distance of each voice. That is, the relatively shorter distance of the target sound source than that of the noise source causes the signal-to-noise ratio to be more improved. The strength of a voice produced at a point, for example, 30 centimeters away and that of a voice produced at a point one meter away are different from each other by a factor of 10.
However, the strengths are different from each other by a factor of only 2.25 for cases of two meters and three meters, though the difference in the distance is one meter. That is, a shorter distance between the target sound source and the microphone causes the speech recognition to be more accurately executed. According to a method using the above fact, a microphone is brought close to the mouth of a speaker. In many of speech recognition systems mounted in, for example, personal computers, a speaker wears a headset type microphone. As described above, a signal-to-noise ratio has been improved by arranging a microphone in extremely close proximity to a sound source.
However, a speaker is required to wear a microphone at any time in the above method. In the case of an interactive robot, a microphone is built in the robot. Accordingly, when the method in which the distance between the microphone and the speaker is noted as described above is adopted for the interactive robot, the interactive robot is required to be located in the vicinity of the speaker. Accordingly, the above method is not appropriate for an interactive robot moved according to instructions of a speaker for various kinds of actions. On the other hand, a human being itself can approach an interactive robot whenever the human being produces a voice. However, it is inconvenient for a disabled person to approach the robot.