Quality of Voice over IP (VoIP) calls and the performance of automatic speech recognition may be sensibly degraded by the presence of background noise. To overcome these problems, many speech enhancement techniques have been proposed. In some traditional single channel methods, the statistic of noise spectral power is estimated when the speech is silent, and then a spectral gain is determined from the noisy mixture. Some multichannel methods aim at reducing the noise by estimating spatial filters constrained to the speech and noise spatial covariance. While traditional single channel methods are effective in reducing stationary background noise, multichannel methods can remove more effectively non-stationary noise that is spatially coherent and spatially static. However, when the noise is both incoherent and non-stationary, neither of these methods is able to suppress it effectively.
An example of a noise that may be neither stationary nor spatially static is transient noise. Transient noise may vary more quickly than speech and its power is difficult to accurately estimate. Keyboard stroke noise and finger tap noise are examples of transient noise generated in mobile devices such as laptops or tablets. In these devices transient noise suppression may be utilized to improve the VoIP call quality.
Some methods for transient noise suppression are based on ad-hoc spectral models aimed at the detection of the transient frames. However, because the transient noise power is not deterministically predictable, spectral gains derived by these models are more prone to distort the speech. This happens more frequently with unvoiced speech frames since they have a transient-like characteristic.
Various techniques for reducing transient noise or keystroke suppression, mostly based on single channel processing, are identified in: U.S. Patent Application Publication No. 2008/0212795, published on Sep. 4, 2008 and entitled “Transient Detection and Modification in Audio Signals”; U.S. Pat. No. 8,213,635 issued on Jul. 3, 2012 and “Keystroke Sound Suppression”; Min-Seok Choi and Hong-Goo Kang, “Transient Noise Reduction In Speech Signal With a Modified Long-Term Predictor,” in EURASIP Journal on Advances in Signal Processing, December 2011; and R. Talmon, I. Cohen, S. Gannot, “Single-Channel Transient Interference Suppression With Diffusion Maps” in IEEE Transactions on Audio, Speech, and Language Processing, Vol. 21, No. 1, January 2013. However, the techniques described in these references are subject to speech distortion because speech onset can have a spectral characteristic that is very close to that of the noise. Although a multichannel technique is identified in U.S. Pat. No. 8,867,757 issued on Oct. 21, 2013 “Microphone Under Keyboard to Assist In Noise Cancellation,” it requires an ad-hoc microphone placement which can limit its flexibility for general purpose consumer applications.