1. Field of the Invention
The present invention relates to an apparatus and method of reducing a noise signal of a speech signal in a speech recognizer, and more particularly, to a noise reduction apparatus and method in which a signal to noise ratio of a speech signal inputted from a speech recognizer is estimated for each frequency bandwidth and a noise suppression rate for each frequency bandwidth is controlled according to the estimated signal to noise ratios to reduce a noise signal.
2. Description of Related Art
Generally, a speech recognizer extracts a feature vector from a frequency domain by performing a Fast Fourier Transform (FFT) on an inputted speech signal and recognizes the inputted speech signal by using stored speech data and the feature vector extracted from the inputted speech signal.
However, when receiving a speech signal in which ambient noise is mixed, a speech recognition rate of the speech recognizer may be severely degraded. Specifically, a probability of an incorrect speech recognition result is high when a speech signal inputted in a process of recognizing a speech is distorted by external noise, in the speech recognizer.
Therefore, a method of reducing a noise signal mixed in an input signal to increase a speech recognition rate is required.
A conventional noise reduction apparatus of a speech recognizer employs a method of controlling a noise reduction rate with respect to all frequency components according to a speech-noise detection result, increasing the noise reduction rate when detecting a noise section, and lowering the noise reduction rate when detecting a speech section.
However, in the conventional method of increasing the noise reduction rate with respect to the noise section, since a speech signal and a noise signal are detected in a time axis, an identical value is given to all frequencies though a noise/speech rate is shown differently according to each frequency bandwidth in the speech section, effectiveness despite an environmental change is difficult to provide.
On the other hand, in a conventional noise reduction method using spectrum correction and peak/valley accentuation, though Wiener filter scaling is performed by a speech absence probability and a probability estimated via statistic modeling is used, since speech and noise detection is performed on a time axis and an identical value is given to all frequencies, effective noise reduction despite environments with noise of various frequencies may not be provided.
In a conventional method of estimating a noise spectrum, when it is assumed that the noise spectrum is not changed, an amplitude of the noise spectrum is estimated by a noise spectrum mean 100 detected as shown in FIG. 1. However, in actuality, the amplitude of the noise spectrum fluctuates according to time as shown in FIG. 1.
The conventional noise reduction apparatus configures and utilizes a Wiener filter to subtract the noise spectrum mean from an input signal.
However, in the conventional noise reduction apparatus, an amplitude of a speech signal is in inverse proportion to a number of errors. Specifically, in the conventional noise reduction apparatus, most errors occur due to one-sidedly subtracting the noise spectrum mean from a part in which the amplitude of the speech signal is small. This result is shown in FIG. 2
FIG. 3 is a diagram illustrating an example of a frequency feature of a clean speech signal.
Referring to FIG. 3, a spectrum showing the frequency feature of the clean speech signal indicates a frequency feature of a clean speech signal into which a noise signal does not flow. An amplitude of the speech signal is frequently changed, the amplitude of the speech signal is different in each frequency bandwidth.
FIG. 4 is a diagram illustrating an example of a frequency feature of a speech signal mixed with a noise signal generated from within vehicular environments.
Referring to FIG. 4, a spectrum according to the frequency feature of the speech signal mixed with a noise signal indicates a frequency feature of a speech signal according to vehicle environments. In an input signal according to vehicle environments, only a noise signal exists in a section without speech, the speech signal is different from the noise signal in the frequency feature, and particularly, a noise effect is shown mostly in a low frequency of less than 1 KHz. As described above, a noise signal flowing together with a speech signal inputted to a speech recognizer may have a different amplitude for each frequency bandwidth, instead of having a constant appearance according to a frequency.
FIG. 5 is a diagram illustrating a frequency feature of a speech signal from which a noise signal is reduced by a conventional noise reduction method. Referring to FIG. 5, in a spectrum indicating the frequency feature of the speech signal from which the noise signal is reduced, since the noise signal is not constant, when the noise signal is reduced from the speech signal according to the conventional noise reduction method, parts 510 and 520 of the speech signal are lost in a process of reducing the noise signal.
As described above, since the conventional noise reduction method employs a system parameter optimized with respect to a type or amplitude of a noise signal of only one kind, an identical parameter is applied to all types of frequencies and effectiveness is difficult to be guaranteed when the amplitude of a noise signal is changed.
Accordingly, a noise reduction method applying a different noise suppression rate with respect to a speech signal according a type of a noise signal or amplitude changes of a noise signal is acutely required.