Noise suppression (NS) of speech signals can be useful to many applications. In cellular telephony, for example, noise suppression can be used to remove background noise to provide more readily intelligible speech from calls made in noisy environments. Likewise, noise suppression can improve perceptual quality and speech intelligibility in teleconferencing, voice chat in on-line games, Internet-based voice messaging and voice chat, and other like communications applications. The input audio signal is typically noisy for these applications since the recording environment is less than ideal. Further, noise suppression can improve compression performance when used prior to coding or compression of voice signals (e.g., via the Windows Media Voice codec, and other like codecs). Noise suppression also can be applied prior to speech recognition to improve recognition accuracy.
There are some well-known techniques for noise suppression in speech signals, such as spectral subtraction and Minimum Mean Square Error (MMSE). Almost all of these known techniques suppress the noise by applying a spectral gain G(m, k) based on an estimate of noise in the speech signal to each short-time spectrum value S(m, k) of the speech signal, where m is the frame number and k is the spectrum index. (See, e.g., S. F. Boll, A. V. Oppenheim, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoustics, Speech and Signal Processing, ASSP-27(2), April 1979; and Rainer Martin, “Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics,” IEEE Transactions on Speech and Audio Processing, Vol. 9, No. pp. 504-512, July 2001.) A very low spectral gain is applied to spectrum values estimated to contain noise, so as to suppress the noise in the signal.
Unfortunately, the use of noise suppression may introduce artificial distortions (audible “artifacts”) into the speech signal, such as because the spectral gain applied by the noise suppression is either too great (removing more than noise) or too little (failing to remove the noise completely). One artifact that many NS techniques suffer from is called musical noise, where the NS technique introduces an artifact perceived as a melodic audio signal pattern that was not present in the input. In some cases, this musical noise can become noticeable and distracting, in addition to being an inaccurate representation of the speech present in the input signal.