The present invention relates to noise reduction. In particular, the present invention relates to removing noise from signals used in pattern recognition.
A pattern recognition system, such as a speech recognition system, takes an input signal and attempts to decode the signal to find a pattern represented by the signal. For example, in a speech recognition system, a speech signal (often referred to as a test signal) is received by the recognition system and is decoded to identify a string of words represented by the speech signal.
To decode the incoming test signal, most recognition systems utilize one or more models that describe the likelihood that a portion of the test signal represents a particular pattern. Examples of such models include Neural Nets, Dynamic Time Warping, segment models, and Hidden Markov Models.
Before a model can be used to decode an incoming signal, it must be trained. This is typically done by measuring input training signals generated from a known training pattern. For example, in speech recognition, a collection of speech signals is generated by speakers reading from a known text. These speech signals are then used to train the models.
In order for the models to work optimally, the signals used to train the model should be similar to the eventual test signals that are decoded. In particular, the training signals should have the same amount and type of noise as the test signals that are decoded.
Typically, the training signal is collected under “clean” conditions and is considered to be relatively noise free. To achieve this same low level of noise in the test signal, many prior art systems apply noise reduction techniques to the testing data.
In one technique for removing noise, the prior art identifies a set of correction vectors from a stereo signal formed of two channel signals, each channel containing the same pattern signal. One of the channel signals is “clean” and the other includes additive noise. Using feature vectors that represent frames of these channel signals, a collection of noise correction vectors are determined by subtracting feature vectors of the noisy channel signal from feature vectors of the clean channel signal. When a feature vector of a noisy pattern signal, either a training signal or a test signal, is later received, a suitable correction vector is added to the feature vector to produce a noise reduced feature vector.
In other systems, noise-reduced feature vectors are estimated using models of static aspects of noise, models of the static aspects of clean speech and an observation or acoustic model that predicts the value of a clean speech vector given a noisy speech vector and a noise vector. Although such systems are effective, they are not ideal because the models only represent static aspects of noise and clean speech. They do not represent the dynamic relationships found between neighboring frames of noise and neighboring frames of clean speech. As a result, the sequences of noise-reduced feature vectors produced by these systems tend to include a large number of discontinuities between neighboring feature vectors. In other words, the changes between neighboring noise-reduced feature vectors are not as smooth as in normal speech.
In light of this, a noise reduction technique is needed that is more effective at removing noise from pattern signals.