The present invention relates to noise in signals. In particular, the present invention relates to identifying noise environments from noisy signals used in pattern recognition.
A pattern recognition system, such as a speech recognition system, takes an input signal and attempts to decode the signal to find a pattern represented by the signal. For example, in a speech recognition system, a speech signal (often referred to as a test signal) is received by the recognition system and is decoded to identify a string of words represented by the speech signal.
To decode the incoming test signal, most recognition systems utilize one or more models that describe the likelihood that a portion of the test signal represents a particular pattern. Examples of such models include Neural Nets, Dynamic Time Warping, segment models, and Hidden Markov Models.
Before a model can be used to decode an incoming signal, it must be trained. This is typically done by measuring input training signals generated from a known training pattern. For example, in speech recognition, a collection of speech signals is generated by speakers reading from a known text. These speech signals are then used to train the models.
In order for the models to work optimally, the signals used to train the model should be similar to the eventual test signals that are decoded. In particular, the training signals should have the same amount and type of noise as the test signals that are decoded.
Typically, the training signal is collected under “clean” conditions and is considered to be relatively noise free. To achieve this same low level of noise in the test signal, many prior art systems apply noise reduction techniques to the testing data.
One particular technique for removing noise under the prior art, identifies a set of correction vectors from a stereo signal formed of two channel signals, each channel containing the same pattern signal. One of the channel signals is “clean” and the other includes additive noise. Using feature vectors that represent frames of these channel signals, a collection of noise correction vectors are determined that map the feature vectors of the noisy channel signal to the feature vectors of the clean channel signal. When a feature vector of a noisy pattern signal, either a training signal or a test signal, is later received, a suitable correction vector is added to the feature vector to produce a noise reduced feature vector.
Under the prior art, such systems are either trained using data from a single noise environment, such as an office or a car, or by treating data from different environments as occurring in a single environment. Systems that are trained using data from only a single noise environment experience a drop in performance when they are used in a different noise environment. Thus, a system trained with car noise will not work as well in an airplane.
Systems that treat noise data from different environments as occurring in a single environment are not optimal because they tend to jump between noise correction vectors for different noise environments even when the noise environment in which the system is being used is not changing. Thus, such a system may switch between correction vectors associated with a car, a plane, and an office while it is being used in the single environment of a car.
Some speech recognition systems of the prior art have attempted to identify an environment for an entire utterance by selecting the environment from a group of possible environments. However, because these systems only identify the environment at utterance boundaries, they do not work well when the noise environment changes during an utterance. Thus, a system is needed that can identify a noise environment “on the fly” as each section of an utterance is processed instead of waiting for the entire utterance to be received.