Noise suppression techniques are widely used for reducing noise in speech signals or for audio restoration. Most noise suppression algorithms are based on spectral modification of an input audio signal. A gain filter is applied to the short-time spectra of an audio signal received from an input channel, producing an output signal with reduced noise.
The gain filter is typically a real-valued gain computed per each time-frequency tile (time-slot (window) and frequency-band (BIN)) of said input signal in accordance with an estimate of the noise power in the respective time-frequency tile. The accuracy of the estimation of the amount of noise in the different time-frequency tiles has a crucial effect on the output signal. While under-estimation of the amount of noise in each tile may result in a noisy output signal, over-estimating the amount of noise or having inconsistent estimations introduces various artifacts to the output signal.
Although it is highly desirable to reduce noise in speech and audio signals, noise suppression is a trade-off between the degree of noise reduction and artifacts associated therewith. Generally, the degree of artifacts in the output signal depends on the accuracy of the noise estimation and the degree of noise reduction sought. The more noise is to be removed, the more likely are artifacts due to aliasing effects and time variance of the gain filter. However, as the estimation of noise in the input signal is more accurate, a higher degree of noise reduction can be obtained without increasing the artifacts associated therewith. Reference [4] is an example of a gain filtering technique for noise suppression proposed by the inventor of the present invention.
There are many techniques for the estimation of the amount of noise in the input signal. Most of those techniques are based on some assumptions relating to the nature of the input signal, the desired output signal or the noise. For example, one such technique is based on the assumption that the power of the noise component in the input signal is generally lower than the pure signal to be obtained. Accordingly, time frequency tiles having a lower power (e.g. below a certain threshold) are considered as noisy and are therefore suppressed. According to another technique, the noise reduction filter is targeted at enhancing and suppressing certain spectral bands (e.g. speech/voice related bands) which are considered as associated with the desired input signal and noise, respectively.
In accordance with another method proposed by the inventor of the present invention, the amount of noise is estimated by determining “noisy” time frames that include only noise (e.g. using a voice activity detector, VAD). In this case, the power of noise in each time-frequency tile of the preceding and/or following time frames (in which voice is detected) is estimated based on the power of the corresponding tiles of the “noisy” time frames.
Some techniques utilize directional beam forming for enhancing the sound of a particular sound source from a particular direction over other sounds, in acoustic situations in which multiple sound sources exist. Generally, according to these techniques, the input signals received from multiple microphones are combined with proper phase delays so as to enhance the sound components arriving at the microphones from certain directions. This allows the separation of sound sources, the reduction of background noise, and the isolation of a particular person's voice from multiple talkers surrounding that person.
Directional beam forming can be performed utilizing input signals received from an array of multiple microphones which may be omni-directional microphones (or not highly directional). Many types of multiple microphone directional arrays have been constructed in the past 50 years, as is described for example in references [2] and [3].
Multi-microphone arrays are also characterized by a trade-off between the enhancement of source-signal-to-background-noise, and the accuracy at which the direction of a sound source is determined. While delay-and-subtract methods, sometimes referred to as virtual cardioids, yield wide directional beams and a poor source-signal-to-background-noise ratio, adaptive-filter beam-formers can get narrow beams pointing at an exact direction of a sound source, only if the direction of the sound source is known and tracked precisely. At the same time, widening the beam also makes the algorithms sensitive to room reflections and reverberation.