1. Technical Field
The system is directed to the field of sound processing. More particularly, this system provides a way to enhance speech recognition using spectro-temporal varying, technique to computer suppression gain.
2. Background of the Invention
Speech enhancement often involves the removal of noise from a speech signal. It has been a challenging topic of research to enhance a speech signal by removing extraneous noise from the signal so that the speech may be recognized by a speech processor or by a listener. Various approaches have been developed in the prior art. Among these approaches the spectral subtraction methods are the most widely used in real-time applications. In the spectral subtraction method, an average noise spectrum is estimated and subtracted from the noisy signal spectrum, so that average signal-to-noise ratio (SNR) is improved. It is assumed that when the signal is distorted by a broad-band, stationary, additive noise, the noise estimate is the same during the analysis and the restoration and that the phase is the same in the original and restored signal.
Subtraction-type methods have a disadvantage in that the enhanced speech is often accompanied by a musical tone artifact that is annoying to human listeners. There are a number of distortion sources in the subtraction type scheme, but the dominant distortion is a random distribution of tones at different frequencies which produces a metallic sounding noise, known as “musical noise” due to its narrow-band spectrum and the tin-like sound.
This problem becomes more serious when there are high levels of noise, such as wind, fan, road, or engine noise, in the environment. Not only does the noise sound musical, the remaining voice left unmasked by the noise often sounds “thin”, “tinny”, or musical too. In fact, the musical noise has limited the performance of speech enhancement algorithms to a great extent.
Various solutions have been proposed to overcome the musical noise problem. Most of them are directed toward finding an improved estimate of the SNR using constant or adaptive time-averaging factors. The time-averaging based methods are effective in removing music noise, however at a cost of degrading the speech signal and also introducing unwanted delay to the system.
Another method of removing music noise is by overestimating the noise, which causes the musical tones to also be subtracted out. Unfortunately, speech that is close in spectral magnitude to the noise is also subtracted out producing even thinner sounding speech.
A classical speech enhancement system relies on the estimation of a short-time suppression gain which is a function of the a priori Signal-to-Noise Ratio (SNR) and or the a posteriori SNR. Many approaches have been proposed over the years on how to estimate the a priori SNR when only the noisy speech is available. Examples of such prior art approaches include Ephraim, Y.; Malah, D.; Speech Enhancement Using A Minimum-Mean Square Error Short-Time Spectral Amplitude Estimator, IEEE Trans. on Acoustics, Speech, and Signal Processing Volume 32, Issue 6, December 1984 Pages: 1109-1121 and Linhard, K, Haulick, T; Spectral Noise Subtraction With Recursive Gain Curves, 5th International Conference on Spoken Language Processing, Sydney, Australia, Nov. 30-Dec. 4, 1998.
In Ephraim, Y.; Malah, D.; Speech Enhancement Using A Minimum Mean-Square Error Log-Spectral Amplitude Estimator, IEEE Trans on Acoustics, Speech, and Signal Processing, Volume 33, Issue 2, April 1985 Pages: 443-445, Ephraim and Malah proposed a decision-directed approach which is widely used for speech enhancement. The a priori SNR calculated based on this approach follows the shape of a posteriori SNR. However, this approach introduces delay because it uses the previous speech estimation to compute the current a priori SNR. Since the suppression gain depends on the a priori SNR, it does not match with the current frame and therefore degrades the performance of the speech enhancement: system. This approach is described below.
Classical Noise Reduction Algorithm
In the classical additive noise model, the noisy speech is given byy(t)=x(t)+d(t)
Where x(t) and d(t) denote the speech and the noise signal, respectively.
Let |Yn,k|, |Xn,k|, and |Dn,k| designate the short-time Fourier spectral magnitude of noisy speech, speech and noise at nth frame and kth frequency bin. The noise reduction process consists in the application of a spectral gain Gn,k to each short-time spectrum value. An estimate of the clean speech spectral magnitude can be obtained as:|{circumflex over (X)}n,k|=Gn,k|Yn,k|
The spectral suppression gain Gn,k is dependent on the a posteriori SNR defined by
            SNR      post        ⁡          (              n        ,        k            )        =                                      Y                      n            ,            k                                      2              E      ⁢              {                                                        D                              n                ,                k                                                          2                }            
and the a priori SNR is defined by
            SNR      priori        ⁡          (              n        ,        k            )        =                    E        ⁢                              {                                                        X                                  n                  ,                  k                                                                    }                    2                            E        ⁢                  {                                                                  D                                  n                  ,                  k                                                                    2                    }                      .  
Since speech and noise power are not available, the two SNRs have to be estimated. The a posteriori SNR is usually calculated by:
      S    ⁢                  ⁢          N      ^        ⁢                  R        post            ⁡              (                  n          ,          k                )              =                                      Y                      n            ,            k                                      2                      σ        ⁡                  (                      n            ,            k                    )                    2      
Here, σ(n,k)2 is the noise estimate.
The a priori SNR can be estimated in many different ways according to the prior art. The standard estimation without recursion has the form:S{circumflex over (N)}Rpriori(n,k)=S{circumflex over (N)}Rpost(n,k)−1  (1)
Another approach for a priori SNR estimation is known as a “decision-directed” recursive version and is proposed in the prior art as:
                              S          ⁢                                          ⁢                      N            ^                    ⁢                                    R              priori                        ⁡                          (                              n                ,                k                            )                                      =                              α            ⁢                                                                                                                        X                      ^                                        ⁡                                          (                                                                        n                          -                          1                                                ,                        k                                            )                                                                                        2                                                                                                    σ                    ⁡                                          (                                              n                        ,                        k                                            )                                                                                        2                                              +                                    (                              1                -                α                            )                        ⁢                          P              ⁡                              (                                                      S                    ⁢                                                                                  ⁢                                          N                      ^                                        ⁢                                                                  R                        post                                            ⁡                                              (                                                  n                          ,                          k                                                )                                                                              -                  1                                )                                                                        (        2        )            
A simpler recursive version is proposed in another approach as:S{circumflex over (N)}Rpriori(n,k)=G(n−1,k)S{circumflex over (N)}Rpost(n,k)−1  (3)
Where G(n,k) is the so-called Wiener suppression gain calculated by:
      G    ⁡          (              n        ,        k            )        =            S      ⁢                          ⁢              N        ^            ⁢                        R          priori                ⁡                  (                      n            ,            k                    )                                    S        ⁢                                  ⁢                  N          ^                ⁢                              R            priori                    ⁡                      (                          n              ,              k                        )                              +      1      
In general, the suppression gain is a function of the two estimated SNRs.G(n,k)=ƒ(S{circumflex over (N)}Rpriori(n,k),S{circumflex over (N)}Rpost(n,k))  (4)
As noted above, because the suppression gain depends on the a priori SNR, it does not match with the current frame and therefore degrades the performance of the speech enhancement system.