Acoustic noise suppression in a speech communication system generally serves the purpose of improving the overall quality of the desired audio or speech signal by filtering environmental background noise from the desired speech signal. This speech enhancement process is particularly necessary in environments having abnormally high level of background noise.
Reference is now made to FIG. 1 which illustrates one noise suppressor which uses spectral subtraction (or spectral gain modification). The noise suppressor includes frequency and time domain converters 10 and 12, respectively, and a noise attenuator 14.
The frequency domain converter 10 includes a bank of bandpass filters which divide the audio input signal into individual spectral bands. The noise attenuator 14 attenuates particular spectral bands according to their noise energy content. To do so, the attenuator 14 includes an estimator 16 and a channel gain determiner 18. Estimator 16 estimates the background noise and signal power spectral densities (PSDs) to generate a signal to noise ratio (SNR) of the speech in each channel. The channel gain determiner 18 uses the SNR to compute a gain factor for each individual channel and to attenuate each spectral band. The attenuation is performed by multiplying, via a multiplier 20, the signal of each channel by its gain factor. The channels are recombined and converted back to the time domain by converter 12, thereby producing a noise suppressed signal.
For example, in the article by M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of Speech Corrupted by Acoustic Noise", Proceedings of the IEEE International Conference on Acoustic Speech Signal Processing, pp. 208-211, April 1979, which is incorporated herein by reference, the method of linear spectral subtraction is discussed. In this method, the channel gain .gamma..sub.ch (i) is determined by subtracting the noise power spectrum from the noisy signal power spectrum. In addition, a spectral floor .beta. is used to prevent the gain from descending below a lower bound, .beta..vertline..EPSILON..sub.n (i).vertline..
The gain is determined as follows: ##EQU1##
where: ##EQU2##
.vertline..EPSILON..sub.ch (i).vertline. is the smoothed estimate of the magnitude of the corrupted speech in the ith channel and .vertline..EPSILON..sub.n (i).vertline. is the smoothed estimate of the magnitude of the noise in the ith channel.
FIG. 2 illustrates the channel gain function .gamma..sub.ch (i) per channel SNR ratio and indicates that the channel gain has a short floor 21 after which the channel gain increases monotonically.
Unfortunately, the noise suppression can cause residual `musical` noise produced when isolated spectral peaks exceed the noise estimate for a very low SNR input signal.
FIGS. 3A and 3B, to which reference is now made, illustrate the typical channel energy in an input signal and the linear spectral subtraction, gain signal, over time. The energy signal of FIG. 3A shows high energy speech peaks 22 between which are sections of noise 23. The gain function of FIG. 3B has accentuated areas 24, corresponding to the peaks 22, and significant fluctuations 25 between them, corresponding to the sections of noise in the original energy signal. The gains in the accentuated areas 24 cause the high energy speech of the peaks 22 to be heard clearly. However, the gain in the fluctuations 25, which are of the same general strength as the gain in the accentuated areas 24, cause the musical noise to be heard as well.
The following articles and patents discuss other noise suppression algorithms and systems:
G. Whipple, "Low Residual Noise Speech Enhancement Utilizing Time-Frequency Filtering", Proceedings of the IEEE International Conference on Acoustic Speech Signal Processing, Vol. I, pp. 5-8, 1994; and
U.S. Pat. Nos. 5,012,519 and 5,706,395.