The quality of speech captured by personal computers can be degraded by environmental noise and/or by reverberation (e.g., caused by the sound waves reflecting off walls and other surfaces, especially in a large room). Quasi-stationary noise produced by computer fans and air conditioning can be significantly reduced by spectral subtraction or similar techniques. In contrast, removing non-stationary noise and/or reducing the distortion caused by reverberation can be more difficult. De-reverberation is a difficult blind deconvolution problem due to the broadband nature of speech and the high order of the equivalent impulse response from the speaker's mouth to the microphone.
Signal enhancement can be employed, for example, in the domains of improved human perceptual listening (especially for the hearing impaired), improved human visualization of corrupted images or videos, robust speech recognition, natural user interfaces, and communications. The difficulty of the signal enhancement task depends strongly on environmental conditions. Take an example of speech signal enhancement, when a speaker is close to a microphone and the noise level is low and when reverberation effects are fairly small, standard signal processing techniques often yield satisfactory performance. However, as the distance from the microphone increases, the distortion of the speech signal, resulting from large amounts of noise and significant reverberation, becomes gradually more severe.
Conventional signal enhancement systems have employed signal processing methods, such as spectral subtraction, noise cancellation, and array processing. These methods have had many well known successes; however, they have also fallen far short of offering a satisfactory, robust solution to the general signal enhancement problem. For example, one shortcoming of these conventional methods is that they typically exploit just second order statistics (egg., functions of spectra) of the sensor signals and ignore higher order statistics. In other words, they implicitly make a Gaussian assumption on speech signals that are highly non-Gaussian. A related issue is that these methods typically disregard information on the statistical structure of speech signals. In addition, some of these methods suffer from the lack of a principled framework. This has resulted in ad hoc solutions, for example, spectral subtraction algorithms that recover the speech spectrum of a given frame by essentially subtracting the estimated noise spectrum from the sensor signal spectrum, requiring a special treatment when the result is negative due in part to incorrect estimation of the noise spectrum when it changes rapidly over time. Another example is the difficulty of combining algorithms that remove noise with algorithms that handle reverberation into a single system in a systematic manner.