Background Art
Efficient and accurate sound capturing is required for real-time communication scenarios (such as messenger programs, VoIP telephony, and groupware) and speech recognition (such as voice commands and dictation). However one problem with capturing “clean” sound is that together with the speech signal, the microphone also acquires ambient noises and reverberations. Humans have great ability to remove these distracting influences when present in the same room. The brain uses the information from both ears and adapts to different room response functions. However, if sound is recorded with a mono microphone in one room and the signal is transferred to another room, the brain cannot remove the reverberation. This reduces the intelligibility of the playback and leads to a poor listening experience.
Studies also show that the presence of reverberation in a room seriously reduces the effectiveness of automatic speech recognition (ASR) engines. The need to improve the speech recognition results by presenting clean sound input has fostered huge amounts of research into the areas of noise suppression, microphone array processing, acoustic echo cancellation and methods for reducing the effects of acoustic reverberation.
Reducing reverberation through deconvolution (inverse filtering) is one of the most common approaches. The main problem is that the channel must be known or very well estimated for successful deconvolution. The estimation is done in the cepstral domain or on envelope levels. Multi-channel variants use the redundancy of the channel signals and frequently work in the cepstral domain.
Blind dereverberation methods seek to estimate the input(s) to the system without explicitly computing a deconvolution or inverse filter. Most of them employ probabilistic and statistically based models.
Dereverberation via suppression and enhancement is similar to noise suppression. These algorithms either try to suppress the reverberation, enhance the direct-path speech, or both. There is no channel estimation and there is no signal estimation, either. Usual techniques are long-term cepstral mean subtraction, pitch enhancement, and LPC analysis, in single or multi-channel implementation.
Unfortunately, the foregoing methods have problems. The most common issues are slow reaction when reverberation changes, poor robustness to noise, and excessive computational requirements.