In hands-free voice communication systems, sound captured by a microphone may include background noise and reverberation, resulting in a degradation of the quality of speech. The distortion is detrimental to both perceived speech quality as well as to the performance of speech processing algorithms, such as automatic speech recognition systems, that are primarily designed for clean speech signals. Typical background noise is additive in nature and may be reduced by exploiting its uncorrelatedness with speech. Reverberation, however, results from convolution of the room impulse response with the signal, and is thus highly correlated with the speech signal. Moreover, the problem of dereverberation is blind because of limited knowledge about the speech source as well as the reverberation impulse response. Dereverberation methods typically require prior knowledge of, or at least a model for, either the speech or the reverberation impulse response.
Existing methods often address dereverberation as an inverse filtering problem by assuming that the room impulse response, is known a priori. Unfortunately, inverse filtering methods cannot achieve perfect reconstruction of the signal for the following reasons. First, the room impulse response is seldom known a priori, and in practice the reverberation in a room depends on many factors, including the number of people present, humidity and temperature, all of which are time-varying. Second, the room response is rarely minimum-phase and is consequently not invertible by inverse linear filtering.
Another approach for dereverberation exploits prior knowledge of the structure of speech and includes the use of linear prediction and harmonic structure analysis. However, these techniques typically require a large amount of reverberant speech for training and may suffer degraded performance for unvoiced speech and non-speech sources.
Dereverberation methods that assume only general knowledge of the nature of the room impulse response and only general knowledge of the structure of the speech signal are desired.