This invention relates to communication system noise cancellation techniques, and more particularly relates to gain adjustment calculations used in such techniques.
The need for speech quality enhancement in single-channel speech communication systems has increased in importance especially due to the tremendous growth in cellular telephony. Cellular telephones are operated often in the presence of high levels of environmental background noise, such as in moving vehicles. Such high levels of noise cause significant degradation of the speech quality at the far end receiver. In such circumstances, speech enhancement techniques may be employed to improve the quality of the received speech so as to increase customer satisfaction and encourage longer talk times.
Most noise suppression systems utilize some variation of spectral subtraction. FIG. 1A shows an example of a typical prior noise suppression system that uses spectral subtraction. A spectral decomposition of the input noisy speech-containing signal is first performed using the Filter Bank. The Filter Bank may be a bank of bandpass filters (such as in reference [1], which is identified at the end of the description of the preferred embodiments). The Filter Bank decomposes the signal into separate frequency bands. For each band, power measurements are performed and continuously updated over time in the Noisy Signal Power & Noise Power Estimation block. These power measures are used to determine the signal-to-noise ratio (SNR) in each band. The Voice Activity Detector is used to distinguish periods of speech activity from periods of silence. The noise power in each band is updated primarily during silence while the noisy signal power is tracked at all times. For each frequency band, a gain (attenuation) factor is computed based on the SNR of the band and is used to attenuate the signal in the band. Thus, each frequency band of the noisy input speech signal is attenuated based on its SNR.
FIG. 1B illustrates another more sophisticated prior approach using an overall SNR level in addition to the individual SNR values to compute the gain factors for each band. (See also reference [2].) The overall SNR is estimated in the Overall SNR Estimation block. The gain factor computations for each band are performed in the Gain Computation block. The attenuation of the signals in different bands is accomplished by multiplying the signal in each band by the corresponding gain factor in the Gain Multiplication block. Low SNR bands are attenuated more than the high SNR bands. The amount of attenuation is also greater if the overall SNR is low. After the attenuation process, the signals in the different bands are recombined into a single, clean output signal. The resulting output signal will have an improved overall perceived quality.
The decomposition of the input noisy speech-containing signal can also be performed using Fourier transform techniques or wavelet transform techniques. FIG. 2 shows the use of discrete Fourier transform techniques (shown as the Windowing & FFT block). Here a block of input samples is transformed to the frequency domain. The magnitude of the complex frequency domain elements are attenuated based on the spectral subtraction principles described earlier. The phase of the complex frequency domain elements are left unchanged. The complex frequency domain elements are then transformed back to the time domain via an inverse discrete Fourier transform in the IFFT block, producing the output signal. Instead of Fourier transform techniques, wavelet transform techniques may be used for decomposing the input signal.
A Voice Activity Detector is part of many noise suppression systems. Generally, the power of the input signal is compared to a variable threshold level. Whenever the threshold is exceeded, speech is assumed to be present. Otherwise, the signal is assumed to contain only background noise. Such two-state voice activity detectors do not perform robustly under adverse conditions such as in cellular telephony environments. An example of a voice activity detector is described in reference [5].
Various implementations of noise suppression systems utilizing spectral subtraction differ mainly in the methods used for power estimation, gain factor determination, spectral decomposition of the input signal and voice activity detection. A broad overview of spectral subtraction techniques can be found in reference [3]. Several other approaches to speech enhancement, as well as spectral subtraction, are overviewed in reference [4].
Preservation of the natural spectral shape of the speech signal is important to perceived speech quality. The known noise cancellation systems are ineffective in preserving the natural spectral shape of a speech signal. This invention provides an economical and effective solution to the problem.