1. Field of the Invention
The present invention relates to a speech processing device, a speech processing method, and a speech processing program.
2. Description of Related Art
A sound emitted in a room is repeatedly reflected by walls or installed objects which cause reverberations. When reverberations are added, frequency characteristics vary from those of an original speech, and thus a speech recognition rate may decrease. In addition, since previously-uttered speech overlaps with currently-uttered speech, an articulation rate may decrease. Therefore, reverberation reducing techniques of reducing reverberation components from speech recorded in reverberation environments have been developed.
For example, Japanese Patent Publication No. 4396449 (Patent Document 1) describes a dereverbing method of acquiring a transfer function of a reverberation space using an impulse response of a feedback path adaptively identified by an inverse filter processing unit and reconstructing a sound source signal by dividing a reverberation speech signal by the magnitude of the transfer function. In the dereverbing method described in Patent Document 1, the impulse response indicating reverberation characteristics is estimated, but since the reverberation time ranges from 0.2 to 2.0 seconds which is relatively long, the computational load excessively increases and a processing delay becomes remarkable. Accordingly, application to speech recognition has not been spread.
H-G. Hirsch, Harald Finster, A New Approach for the Adaptation of HMMs to Reverberation and Background Noise, Speech Communication, Elsevier, 2008, 244-263 (Non-patent Document 1) describes a method of preparing acoustic models trained under reverberation environments having different reverberation times in advance and searching for an acoustic model having the highest likelihood in an environment in which speech is recorded. The reverberation time is a time until a reverberation intensity relative to a maximum value is attenuated to a predetermined intensity. In this method, speech recognition is performed using the searched acoustic model.
However, in the method described in Non-patent Document 1, the positional relationship between a sound source and a sound collection unit is not considered. Meanwhile, the reverberation time in a certain reverberation space is almost constant, but the ratio of the intensity of a reverberation component and the intensity of a direct sound varies depending on the distance from the sound source to the sound collection unit. Accordingly, it cannot be necessarily said that an acoustic model corresponding to the reverberation time is selected, and the speech recognition accuracy may decrease.