1. Field of the Invention
The invention relates to a processing unit, a speech recognition apparatus, a speech recognition system, a speech recognition method, and a storage medium storing a speech recognition program.
2. Description of the Related Art
Japanese Patent Application Publications No. 2006-72127 (JP-A-2006-72127) and No. 2006-03617 (JP-A-2006-03617) each describe a speech recognition apparatus that captures speech signals uttered by a user and then performs speech recognition on the captured speech signals. According to such a speech recognition apparatus, for example, the speech of the speaker is captured by a microphone (e.g., hand-free microphone), and then speech recognition is performed on the speech signals captured by the microphone.
In a case where the speech of a speaker is captured by a microphone such as a hand-free microphone, the speech signals may be influenced by the shape of the room, the shapes around the microphone, and so on. For example, the microphone captures both the sounds (voices) uttered by the speaker and the sounds reflected by walls, etc. Therefore, the reflected sounds are unavoidably captured as reverberations, and they reduce the speech recognition rate. Because such reverberations come from the voices of the speaker, it is quite difficult to remove the influences of the reverberations. Japanese Patent Application Publication No. 2007-65204 (JP-A-2007-65204) describes a method for removing reflection waves reflected by walls, and the like. This method, however, essentially requires a plurality of microphones.
Japanese Patent Application Publications No. 2006-72052 (JP-A-2006-72052) and No. 2006-23488 (JP-A-2006-23488) describe technologies for removing the influences of reverberations to improve the speech recognition rate. According to the reverberation removal methods described in these publications, an inverse filter for removing the influences of reverberation components is estimated. Further, in the reverberation removal method described in JP-A-2006-72052, the captured signals are classified into direct sounds, initial reflection components, and later reverberation components. According to this publication, the initial reflection components are correlative with the direct sounds, and the later reverberation components are correlative with neither the direct sounds nor the initial reflection components.
According to the reverberation removal methods described above, an inverse filter is estimated based on the input acoustic signals, and inverse-filtering is performed, using the inverse filter, on the acoustic signals that are frequency-domain signals. Further, the frequency-domain output signals obtained through the inverse-filtering are then transformed into time-domain signals.
According to the reverberation removal methods described in JP-A-2006-72052 and JP-A-2006-234888, because the inverse filter is estimated for the input acoustic signals, the above-described processes need to be executed in real time. However, such real-time execution is difficult due to an enormous amount of data to be processed, and therefore it is difficult to achieve a high speech recognition rate in the reverberation influence removal methods described above.