This invention relates to a system and method for detecting speech in a signal containing both speech and noise and for removing noise from the signal.
In communication systems it is often desirable to reduce the amount of background noise in a speech signal. For example, one situation that may require background noise removal is a telephone signal from a mobile telephone. Background noise reduction makes the voice signal more pleasant for a listener and improves the outcome of coding or compressing the speech.
Various methods for reducing noise have been invented but the most effective methods are those which operate on the signal spectrum. Early attempts to reduce background noise included applying automatic gain to signal subbands such as disclosed by U.S. Pat. No. 3,803,357 to Sacks. This patent presented an efficient way of reducing stationary background noise in a signal via spectral subtraction. See also xe2x80x9cSuppression of Acoustic Noise in Speech Using Spectral Subtraction,xe2x80x9d IEEE Transactions On Acoustics, Speech and Signal Processing, pp. 1391-1394, 1996.
Spectral subtraction involves estimating the power or magnitude spectrum of the background noise and subtracting that from the power or magnitude spectrum of the contaminated signal. The background noise is usually estimated during noise only sections of the signal. This approach is fairly effective at removing background noise but the remaining speech tends to have annoying artifacts, which are often referred to as xe2x80x9cmusical noise.xe2x80x9d Musical noise consists of brief tones occurring at random frequencies and is the result of isolated noise spectral components that are not completely removed after subtraction. One method of reducing musical noise is to subtract some multiple of the noise spectral magnitude (this is referred to as spectral oversubtraction). Spectral oversubtraction reduces the residual noise components but also removes excessive amounts of the speech spectral components resulting in speech that sounds hollow or muted.
A related method for background noise reduction is to estimate the optimal gain to be applied to each spectral component based on a Wiener or Kalman filter approach. The Wiener and Kalman filters attempt to minimize the expected error in the time signal. The Kalman filter requires knowledge of the type of noise to be removed and, therefore, it is not very appropriate for use where the noise characteristics are unknown and may vary.
The Wiener filter is calculated from an estimate of the speech spectrum as well as the noise spectrum. A common method of estimating the speech spectrum is via spectral subtraction. However, this causes the Wiener filter to produce some of the same artifacts evidenced in spectral subtraction-based noise reduction.
The musical or flutter noise problem was addressed by McAulay and Malpass (1980) by smoothing the gain of the filter over time. See, xe2x80x9cSpeech Enhancement Using a Soft-Decision Noise Suppression Filterxe2x80x9d, IEEE Transactions on Acoustics, Speech, and Signal Processing 28(2): 137-145. However, if the gain is smoothed enough to eliminate most of the musical noise, the voice signal is also adversely affected.
Other methods of calculating an xe2x80x9coptimal gainxe2x80x9d include minimizing expected error in the spectral components. For example, Ephraim and Malah (1985) achieve good results, which are free from musical noise artifacts, by minimizing the mean-square error in the short-time spectral components. See, xe2x80x9cSpeech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimatorxe2x80x9d, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-33(2): 443-445. However, their approach is much more computationally intensive than the Wiener filter or spectral subtraction methods. Derivative methods have also been developed which use look-up tables or approximation functions to perform similar noise reduction but with reduced complexity. These methods are disclosed in U.S. Pat. Nos. 5,012,519 and 5,768,473.
Also known is an auditory masking-based technique for reducing background signal noise, described by Virag (1995) and Tsoukalas, Mourjopoulos and Kokkinakis (1997). See, xe2x80x9cSpeech Enhancement Based On Masking Properties Of The Auditory System,xe2x80x9d Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 796-799; and xe2x80x9cSpeech Enhancement Based On Audible Noise Suppressionxe2x80x9d, IEEE Transactions on Speech and Audio Processing 5(6): 497-514. That technique requires excessive computation capacity and they do not produce the desired amount of noise reduction.
Other methods for noise reduction include estimating the spectral magnitude of speech components probabilistically as used in U.S. Pat. Nos. 5,668,927 and 5,577,161. These methods also require computations that are not performed very efficiently on low-cost digital signal processors.
Another aspect of the background noise reduction problem is determining when the signal contains only background noise and when speech is present. Speech detectors, often called voice activity detectors (VADs), are needed to aid in the estimation of the noise characteristics. VADs typically use many different measures to determine the likelihood of the presence of speech. Some of these measures include: signal amplitude, short-term signal energy, zero crossing count, signal to noise ratio (SNR), or SNR in spectral subbands. These measures may be smoothed and weighted in the speech detection process. The VAD decision may also be smoothed and modified to, for example, hang on for a short time after the cessation of speech.
In summary, there are methods for reducing noise in speech which are efficient and simple but which produce excessive artifacts. There are also methods which do not produce the musical artifacts but which are computationally intensive. What is needed is an efficient, low-delay method of removing background noise from speech that produces few or no artifacts.
The present invention is directed to a system and method for removing noise from a signal containing speech (or a related, information carrying signal) and noise. The input signal is a voice signal corrupted by added noise, and the output is the speech signal with the added noise reduced. According to the present invention, an adaptive filter is provided featuring a speech spectrum estimator receiving as input an estimated spectral magnitude signal for a time frame of the input signal and generating an estimated speech spectral magnitude signal representing estimated spectral magnitude values for speech in a time frame. A spectral gain generator receives as input the estimated spectral magnitude signal and the estimated speech spectral magnitude signal and generates as output an initial spectral gain signal that yields an estimate of speech spectrum in a time frame of the input signal when the initial spectral gain signal is applied to the spectral signal. A spectral gain modifier receives as input the initial spectral gain signal and generates a modified gain signal by limiting a rate of change of the initial spectral gain signal with respect to the spectral gain over a number of previous time frames. The modified gain signal is then applied to the spectral signal, which is then converted to its time domain equivalent.
In addition, the present invention is directed to a system and method for filtering an input signal comprising a digitally sampled audio signal containing speech and added noise, featuring the use of a variable noise multiplier. The noise multiplier is controlled based on a measure of whether speech is present in a time frame. The value of the noise multiplier is controlled to be a larger value when a time frame of the input signal contains more noise than speech and is controlled to be a smaller value for the noise multiplier when a time frame of the input signal contains more speech than noise.
The above and other objects and advantages of the present invention will become more readily apparent when reference is made to the following description taken in conjunction with the accompanying drawings.