In speech communications such as conference call or smart TV VoIP as the person who talks is far away from the microphone and the call environment is a relatively enclosed space, a signal received by the microphone may be easily interfered by reverberation in the environment. For example, in a room, as the speech is reflected by the surface of the wall, floor and furniture for many times, a signal received by the microphone side is a hybrid signal of a direct sound and a reflection sound. This part of reflection sound refers to reverberation signal. Heavy reverberation will result in unclear speech and thus influence the quality of call. Furthermore, interference from reverberation further degrades the performance of the acoustic receiving system and significantly degrades the performance of the speech recognition system.
The previous dereverberation methods usually employ deconvolution. In such methods, it is necessary to know the accurate shock response or transfer function of the reverberation environment (room or office etc.) in advance. The shock response of the reverberation environment may be measured in advance by a specific method or device, or estimated separately by other methods. Then, with the known shock response of the reverberation environment, an inverse filter is estimated, the deconvolution to the reverberation signals is realized, and the dereverberation is thus realized. Such methods have a problem that it is often difficult to obtain the shock response of the reverberation environment in advance and the process of acquiring the inverse filter itself may introduce in new unstable factors.
Another dereverberation method, as it does not require estimation of the shock response of the reverberation environment and thus does not require both calculation of an inverse filter and execution of inverse filtering, is also called as a blind dereverberation method. Such a method is usually based on speech model assumption. For example, reverberation results in change of the received voiced excitation pulse so that the periodicity becomes not so obvious. As a result, the clarity of speech is influenced. Such a method is usually based on a linear prediction coding (ITC) model, where it is assumed that the speech generation model is an all-pole model and reverberation or other additive noise introduces in new zero points in the whole system, the voiced excitation pulse is interfered, but the all-pole filter is not influenced. The dereverberation method is specifically as follows: the LPC residual of a signal is estimated, and then a clean pulse excitation sequence is estimated according to the pitch-synchronous clustering criterion or kurtosis maximization criterion, so as to realize dereverberation. Such a method has a problem that the calculation is usually highly complex and the assumption that only the all-zero filter is influenced by reverberation is sometimes inconsistent with the experimental analysis.
Dereverberation by a spectral subtraction method is a preferred solution. As a speech signal includes a direct sound, an early reflection sound and a late reflection sound, removing the power spectrum of the late reflection sound from the power spectrum of the whole speech by a spectral subtraction method may improve the quality of speech. However, the key point is the estimation of the spectrum of the late reflection sound, i.e., how to obtain a relatively accurate power spectrum of the late reflection sound to effectively remove the late reflection sound component while not distorting the speech. In the single-channel speech dereverberation, as there is only one path of microphone information available, the estimation of a transfer function of a reverberation environment or the estimation of reverberation time (RT60) is quite difficult.