In handsfree speech communication the speaker is usually located far from the microphone and since the speech intensity decreases with increasing distance to the microphone, even small background noise can have major impact on the perceived speech quality. In a car environment, the background noise is mainly due to the wind and road noise and can be at much higher level than the speech signal itself. The speech signals under this situation are hardly intelligible and a noise reduction function is essential to improve the speech intelligibility.
FIG. 1 shows a typical application of noise reduction algorithm. In this example the noise reduction is combined with an acoustic echo canceller to remove noise and echo from the near end talker's speech signal.
The most common approach for single channel noise reduction is based on frequency domain signal manipulation. FIG. 2 shows the general frame work for single channel frequency domain noise reduction. As can be seen from the figure the noisy speech signal first is converted to the frequency domain. The power of the input signal then is calculated at each individual frequency bin. Based on the calculated power, the power of the speech only and noise only signals are estimated. These two new estimated powers then are used to calculate the noise reduction filter coefficients. These frequency domain filter coefficients then are applied to the spectrum of the noisy speech signal. At final stage the outcome of the above spectrum filtering is transformed to the time domain to reproduce the clean speech signal.
Spectral subtraction noise reduction is a simple and well known method which follows the above scheme. J S. F. Boll: “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Trans. on Acous. Speech and Sig. Proc., 27, 1979. pp. 113-120. In this method the frequency domain filter coefficients are calculated from
      F    ⁡          (              k        ,        m            )        =            max      ⁡              (                                                                                              X                  ⁡                                      (                                          k                      ,                      m                                        )                                                                              2                        -                                          R                n                            ⁡                              (                                  k                  ,                  m                                )                                              ,          0                )                                              X          ⁡                      (                          k              ,              m                        )                                      2      where F(k,m) represents the filter gain at frequency k and time m, X(k,m) is spectrum of the noisy speech signal and Rn(k, m) is the estimated noise power at time m and frequency k.
The spectral subtraction, although a simple method, suffers from an annoying artifact at output signal known as musical noise. The musical noise is caused by randomly spaced spectral peaks that come and go in each frame of data and occur at random frequencies.
Several methods have been proposed that reduce musical noise artifacts at the expense of introducing speech distortion. Minimum mean square error short time spectral estimator proposed by Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp 1109-1121, 1984, is a known noise reduction method that does not have the musical noise artifact but it is computationally expensive to implement and the trade-off between noise reduction and distortion in output speech is poor.
In general most of the existing noise methods are either computationally very expensive or they have poor output quality especially for low signal to noise ratio.