1. Field of the Invention
The present invention generally relates to a method and an apparatus for speech dereverberation. More specifically, the present invention relates to a method and an apparatus for speech dereverberation based on probabilistic models of source and room acoustics.
2. Description of the Related Art
All patents, patent applications, patent publications, scientific articles, and the like, which will hereinafter be cited or identified in the present application, will hereby be incorporated by reference in their entirety in order to describe more fully the state of the art to which the present invention pertains.
Speech signals captured by a distant microphone in an ordinary room inevitably contain reverberation, which has detrimental effects on the perceived quality and intelligibility of the speech signals and degrades the performance of automatic speech recognition (ASR) systems. The recognition performance cannot be improved when the reverberation time is longer than 0.5 sec even when using acoustic models that have been trained under a matched reverberant condition. This is disclosed by B. Kingsbury and N. Morgan, “Recognizing reverberant speech with rasta-plp” Proc. 1997 IEEE International Conference Acoustic Speech and Signal Processing (ICASSP-97), vol. 2, pp. 1259-1262, 1997. Dereverberation of the speech signal is essential, whether it is for high quality recording and playback or for automatic speech recognition (ASR).
Although blind dereverberation of a speech signal is still a challenging problem, several techniques have recently been proposed. Techniques have been proposed that de-correlate the observed signal while preserving the correlation within a short time segment of the signal. This is disclosed by B. W. Gillespie and L. E. Atlas, “Strategies for improving audible quality and speech recognition accuracy of reverberant speech,” Proc. 2003 IEEE International Conference Acoustics, Speech and/Signal Processing (ICASSP-2003), vol. 1, pp. 676-679, 2003. This is also disclosed by H. Buchner, R. Aichner, and W. Kellermann, “Trinicon: a versatile framework for multichannel blind signal processing” Proc. of the 2004 IEEE International Conference. Acoustics, Speech and Signal Processing (ICASSP-2004), vol. III, pp. 889-892, May 2004.
Methods have been proposed for estimating and equalizing the poles in the acoustic response of the room. This is disclosed by T. Hikichi and M. Miyoshi, “Blind algorithm for calculating common poles based on linear prediction,” Proc. of the 2004 IEEE International Conference on Acoustics, Speech, and Signal processing (ICASSP 2004), vol. IV. pp. 89-92, May 2004. This is also disclosed by J. R. Hopgood and P. J. W. Rayner, “Blind single channel deconvolution using nonstationary signal processing,” IEEE Transactions Speech and Audio processing, vol. 11, no. 5, pp. 467-488, September 2003.
Also, two approaches have been proposed based on essential features of speech signals, namely harmonicity based dereverberation, hereinafter referred to as HERB, and Sparseness Based Dereverberation, hereinafter referred to as SBD. HERB is disclosed by T. Nakatani, and M. Miyoshi, “Blind dereverberation of single channel speech signal based on harmonic structure,” Proc. ICASSP-2003. vol. 1, pp. 92-95, April, 2003. Japanese Unexamined Patent Application, First Publication No. 2004-274234 discloses one example of the conventional technique for HERB. SBD is disclosed by K. Kinoshita, T. Nakatani and M. Miyoshi, “Efficient blind dereverberation framework for automatic speech recognition,” Proc. Interspeech-2005, September 2005.
These methods make extensive use of the respective speech features in their initial estimate of the source signal. The initial source signal estimate and the observed reverberant signal are then used together for estimating the inverse filter for dereverberation, which allows further refinement of the source signal estimate. To obtain the initial source signal estimate, HERB utilizes an adaptive harmonic filter, and SBD utilizes a spectral subtraction based on minimum statistics. It has been shown experimentally that these methods greatly improve the ASR performance of the observed reverberant signals if the signals are sufficiently long.
In view of the above, it will be apparent to those skilled in the art from this disclosure that there exists a need for an improved apparatus and/or method for speech dereverberation. This invention addresses this need in the art as well as other needs, which will become apparent to those skilled in the art from this disclosure.