Voice communication systems are susceptible to interfering signals normally referred to as noise. The interfering signals may have harmful effects on the performance of any speech communication system. These effects depend on the specific system being used, on the nature of the noise, the way it interacts with the clean speech signal, and on the relative intensity of the noise compared to that of the signal.
A speech communication system may simply be a recording which was performed in a noisy environment, a standard digital or analog communication system, or a speech recognition system for human/machine communication. Noise may be present at the input of the communication system, in the channel, or at the receiving end. The noise may be correlated or uncorrelated with the signal. It may accompany the clean signal in an additive, multiplicative, or any other more general manner. Examples of noise sources include competitive speech, a background sound like music, a fan, machines, a door slamming, wind or traffic, room reverberation, and Gaussian channel noise.
The ultimate goal of speech enhancement is to minimize the effect of the noise on the performance of speech communication systems. Consider, for example, a cellular radio/telephone communication system. In this system, the transmitted signal is composed of the original speech and the background noise in the car. The background noise is generated by an engine, a fan, traffic, wind, etc. The transmitted signal is also affected by the radio channel noise. As a result, noisy speech with low quality and reduced intelligibility may be delivered by such systems.
Background noise may have additional devastating effects in the performance of a system. Specifically, if the system encodes the signal prior to its transmission, then the performance of the speech coder may significantly deteriorate in the presence of the noise. The reason is that speech coders rely on some statistical model for the clean signal. This model becomes invalid when the signal is noisy. For a similar reason, if a cellular radio system is equipped with a speech recognizer for automatic dialing, then the error rate of such recognizer will be elevated in the presence of the background noise. The goals of speech enhancement in this example are to improve perceptual aspects of the transmitted noise and speech signals as well as to reduce the speech recognizer error rate.
The problem of speech enhancement has been a challenge for many years. Different solutions with various degrees of success have been proposed over the years. One known prior art speech enhancement solution is the spectral subtraction approach as described in "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," by S.F. Boll, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-27, No. 2, April 1979. This approach provides estimates of the clean signal based on the short-term spectrum of the noisy signal. Estimation is performed on a frame-by-frame basis, where each frame consists of 20-40 ms of speech samples. In a spectral subtraction approach, the signal is Fourier transformed, and spectral components whose values are smaller than that of the noise are nulled. The surviving spectral components are modified by an appropriately chosen gain function. The signal estimate is obtained from inverse Fourier transforms of the modified spectral components. Major drawbacks of the spectral subtraction enhancement approach, however, are that noise needs to be explicitly estimated, and the residual noise has annoying tonal characteristics referred to as "musical noise".
The known prior art fails to disclose a simple and accurate method for enhancing the quality of speech transmitted from a noisy environment.