The present invention relates to noise reduction. In particular, the present invention relates to removing noise from signals used in pattern recognition.
A pattern recognition system, such as a speech recognition system, takes an input signal and attempts to decode the signal to find a pattern represented by the signal. For example, in a speech recognition system, a speech signal (often referred to as a test signal) is received by the recognition system and is decoded to identify a string of words represented by the speech signal.
To decode the incoming test signal, most recognition systems utilize one or more models that describe the likelihood that a portion of the test signal represents a particular pattern. Examples of such models include Neural Nets, Dynamic Time Warping, segment models, and Hidden Markov Models.
Before a model can be used to decode an incoming signal, it must be trained. This is typically done by measuring input training signals generated from a known training pattern. For example, in speech recognition, a collection of speech signals is generated by speakers reading from a known text. These speech signals are then used to train the models.
In order for the models to work optimally, the signals used to train the model should be similar to the eventual test signals that are decoded. In particular, the training signals should have the same amount and type of noise as the test signals that are decoded.
Typically, the training signal is collected under “clean” conditions and is considered to be relatively noise free. To achieve this same low level of noise in the test signal, many prior art systems apply noise reduction techniques to the testing data.
In one technique for removing noise, the prior art identifies a set of correction vectors from a stereo signal formed of two channel signals, each channel containing the same pattern signal. One of the channel signals is “clean” and the other includes additive noise. Using feature vectors that represent frames of these channel signals, a collection of noise correction vectors are determined by subtracting feature vectors of the noisy channel signal from feature vectors of the clean channel signal. When a feature vector of a noisy pattern signal, either a training signal or a test signal, is later received, a suitable correction vector is added to the feature vector to produce a noise reduced feature vector.
This stereo-based technique for generating correction vectors has in the past utilized only static descriptions of the pattern signals. Thus, the correction vectors have not incorporated the dynamic nature of pattern signals such as speech. As a result, the sequences of noise-reduced feature vectors tend to include a large number of discontinuities between neighboring feature vectors. In other words, the changes between neighboring noise-reduced feature vectors are not as smooth as in normal speech.
In addition, the stereo-based correction does not perform optimally if a noise in an input signal was not found in the training data. When this occurs, the system attempts to find the closest correction vector. However, since the noise was not found in the training data, the correction vector will not adequately remove the noise. In fact, in areas of the input signal where the signal-to-noise ratio is low, the correction vector can actually worsen the noise in the input signal.
In light of this, a noise reduction technique is needed that is more effective at removing noise from pattern signals.