Communication systems such as those employed in a telephone conferencing system, telephony systems or in audio recording systems often operate in noisy environments. In these scenarios, noise signals may be captured by the systems together with the desired audio data. Typical noise signals can be classified as stationary and non-stationary noises. Stationary noise includes noise that exists for long time duration and exhibits relatively stable characteristics. On the other hand, non-stationary noise includes noise that has the characteristic of varying rapidly with time. An example of stationary noise is the background noise in a room where a capture device is located. An example of a non-stationary noise is the clicking sound caused by pressing a mechanical button (for example, a mute button) on a capture device, which is represented as a short-term burst presented in a captured signal.
It is generally necessary to process a captured signal to suppress the stationary and non-stationary noises in order to improve perceptual quality in the playback. As stationary background noises have stable characteristics and can be predicated more easily, there have been many noise suppression algorithms studied and applied to effectively remove them from the captured signal. However, since non-stationary noise (for example, impulsive noises) have characteristics varying rapidly, they are relatively harder to be suppressed or even reliably detected from a captured signal.
At present, one existing solution for impulsive noise suppression involves simply dividing frames of a captured signal into speech frames or non-speech frames by means of voice activity detection and then applying a suppression gain to the non-speech frames only. It relies on the assumption that non-speech frames have less possibility to contain valuable audio data which is not practical in the case where speech frames contain impulsive noise. As a result, this solution has a higher error rate for noise suppression and an increased impact on speech quality. Latency of audio signal analysis may allow a better decision to be made using future frames to help decide whether to suppress the current frame. However, the introduced latency is generally not acceptable in interactive voice or communication applications.