In order to increase quality and decrease listener fatigue of noisy speech signals that are processed by digital speech processors (e.g. hearing aids or mobile telephones) it is often desirable to apply noise reduction as a pre-processor. Noise reduction methods can be grouped in methods that work in a single-microphone setup and methods that work in a multi-microphone setup.
The focus of the current invention is on single-microphone noise reduction methods. An example where we can find these methods is in the so-called completely in the canal (CIC) hearing aids. However, the use of this invention is not restricted to these single-microphone noise reduction methods. It can easily be combined with multi-microphone noise reduction techniques as well, e.g., in combination with a beam former as a post-processor.
With these noise reduction methods it is possible to remove the noise from the noisy speech signal, i.e., estimate the underlying clean speech signal. However, to do so it is required to have some knowledge of the noise. Usually it is necessary to know the noise power spectral density (PSD). In general the noise PSD is unknown and time-varying as well (dependent on the specific environment), which makes noise PSD estimation a challenging problem.
When the noise PSD is estimated wrongly, too much or too little noise suppression will be applied. For example, when the actual noise level suddenly decreases and the estimated noise PSD is overestimated too much suppression will be applied with a resulting loss of speech quality. When, on the other hand, the noise level suddenly increases, an underestimated noise level will lead to too little noise suppression leading to the generation of excess residual noise, which again decreases the signal quality and increases listeners' fatigue.
Several methods have been proposed in the literature to estimate the noise PSD from the noisy speech signal. Under rather stationary noise conditions the use of a voice activity detector (VAD) [KIM 99] can be sufficient for estimation of the noise PSD. With a VAD the noise PSD is estimated during speech pauses. However, VAD based noise PSD estimation is likely to fail when the noise is non-stationary and will lead to a large estimation error when the noise level or spectrum changes. An alternative for noise PSD estimation are methods based on minimum statistics (MS) [Martin 2001].
These methods do not rely on the use of a VAD, but make use of the fact that the power level in a noisy speech signal at a particular frequency bin seen across a sufficiently long time interval will reach the noise-power level. The length of the time interval provides a trade off between how fast MS can track a time-varying noise PSD on one hand and the risk to overestimate the noise PSD on the other hand.
Recently in [Hendriks 2008] a method was proposed for noise tracking which allows estimation of the noise PSD when speech is continuously present. Although the method proposed in [Hendriks 2008] has been shown to be very effective for noise PSD estimation under non-stationary noise conditions and can be implemented in MATLAB in real-time on a modern PC, the necessary eigenvalue decompositions might be too complex for applications with very low-complexity constraints, e.g. due to power consumption limitations, e.g. in battery driven devices, such as e.g. hearing aids.