A main obstacle in commercial use of speech recognition technologies is a reduced recognition performance caused by noise.
Even a speech recognition system having a substantially perfect performance in a noise-free environment is occasionally low in recognition performance in an actual environment containing noise.
In order to solve such a problem, various approaches have been suggested so far. Typical methods include a spectral enhancement method based on signal processing and an adaptation method based on statistical modeling. In the spectral enhancement method, a noise spectrum is estimated in a speech-free interval, and thus estimated noise spectrum is subtracted from a spectrum of a noise-containing input signal. Spectrum subtraction method and decision-directed Wiener filter method have been widely used as the spectral enhancement method based on signal processing. Among these, the decision-directed Wiener filter method, in which the Wiener filter is extended into a two-stage form, is adopted as ETSI advanced front-end (AFE) standard. It is well known that the Wiener filter using the decision-directed approach is particularly effective in canceling stationary noise.
On the other hand, the model adaptation method adjusts an acoustic model (usually Hidden Markov Model) according to the noise situation instead of adjusting input noise signals. A typical model adaptation method based on statistical modeling is parallel model compensation (PMC) technique. In the PMC technique, clean speech and noise are respectively represented as two different models, and then the two models are combined to model a noise-containing speech. The PMC technique shows a better performance than other methods when the noisy environment is anticipated by the noise models.
Hereinafter, a noise cancellation apparatus and method using a conventional Wiener filter will be described.
FIG. 1 is a block diagram of a conventional Wiener filter module 100 that serves as a noise cancellation apparatus using a Wiener filter. The conventional Wiener filter module 100 includes a spectrum estimation module 101, a power spectral density (PSD) mean estimation module 102, a speech/non-speech estimation module 103, a Wiener filter design module 104 and a Wiener filtering module 105.
As illustrated therein, after the spectrum estimation module 101 receives an input speech Sin to estimate therefrom the frequency representation of each frame, the PSD mean estimation unit 102 estimates a power spectral density mean in the smoothed form from the estimated spectra, the speech/non-speech estimation module 103 estimates noise frequency characteristics in the latest non-speech frame from the estimations in speech and non-speech frames.
The Wiener filter design module 104 receives, for example, the estimated spectrum from spectrum estimation module 101, the PSD mean from PSD mean estimation unit 102 and the noise frequency characteristics from speech/non-speech estimation module 103 to thereby obtain a Wiener filter for the current frame using the estimated noise frequency characteristics. Thereafter, the Wiener filtering module 105 applies the Wiener filter to estimate a clean speech (i.e., speech from which noise has been canceled), thereby producing the estimated clean speech Sout.
In this manner, the noise characteristics are estimated for the latest non-speech frame by the noise cancellation apparatus and method based on the conventional Wiener filter. Thus, the Wiener filter suitable for the input speech Sin is computed therefrom, and the estimated clean speech Sout is provided by applying the computed Wiener filter.
However, the conventional Wiener filter has a drawback in that its performance is limited in such environments where noise characteristics keep changing with time or various kinds of noises are mixed up.