The present invention relates to noise reduction. In particular, the present invention relates to removing noise from signals used in pattern recognition.
A pattern recognition system, such as a speech recognition system, takes an input signal and attempts to decode the signal to find a pattern represented by the signal. For example, in a speech recognition system, a speech signal (often referred to as a test signal) is received by the recognition system and is decoded to identify a string of words represented by the speech signal.
To decode the incoming test signal, most recognition systems utilize one or more models that describe the likelihood that a portion of the test signal represents a particular pattern. Examples of such models include Neural Nets, Dynamic Time Warping, segment models, and Hidden Markov Models.
Before a model can be used to decode an incoming signal, it must be trained. This is typically done by measuring input training signals generated from a known training pattern. For example, in speech recognition, a collection of speech signals is generated by speakers reading from a known text. These speech signals are then used to train the models.
In order for the models to work optimally, the signals used to train the model should be similar to the eventual test signals that are decoded. In particular, the training signals should have the same amount and type of noise as the test signals that are decoded.
Typically, the training signal is collected under “clean” conditions and is considered to be relatively noise free. To achieve this same low level of noise in the test signal, many prior art systems apply noise reduction techniques to the testing data. In particular, many prior art speech recognition systems use a noise reduction technique known as spectral subtraction.
In spectral subtraction, noise samples are collected from the speech signal during pauses in the speech. The spectral content of these samples is then subtracted from the spectral representation of the speech signal. The difference in the spectral values represents the noise-reduced speech signal.
Because spectral subtraction estimates the noise from samples taken during a limited part of the speech signal, it does not completely remove the noise if the noise is changing over time. For example, spectral subtraction is unable to remove sudden bursts of noise such as a door shutting or a car driving past the speaker.
In another technique for removing noise, the prior art identifies a set of correction vectors from a stereo signal formed of two channel signals, each channel containing the same pattern signal. One of the channel signals is “clean” and the other includes additive noise. Using feature vectors that represent frames of these channel signals, a collection of noise correction vectors are determined by subtracting feature vectors of the noisy channel signal from feature vectors of the clean channel signal. When a feature vector of a noisy pattern signal, either a training signal or a test signal, is later received, a suitable correction vector is added to the feature vector to produce a noise reduced feature vector.
Under the prior art, each correction vector is associated with a mixture component. To form the mixture component, the prior art divides the feature vector space defined by the clean channel's feature vectors into a number of different mixture components. When a feature vector for a noisy pattern signal is later received, it is compared to the distribution of clean channel feature vectors in each mixture component to identify a mixture component that best suits the feature vector. However, because the clean channel feature vectors do not include noise, the shapes of the distributions generated under the prior art are not ideal for finding a mixture component that best suits a feature vector from a noisy pattern signal.
In addition, the correction vectors of the prior art only provided an additive element for removing noise from a pattern signal. As such, these prior art systems are less than ideal at removing noise that is scaled to the noisy pattern signal itself.
In light of this, a noise reduction technique is needed that is more effective at removing noise from pattern signals.