The present invention relates to noise estimation. In particular, the present invention relates to estimating noise in signals used in pattern recognition.
A pattern recognition system, such as a speech recognition system, takes an input signal and attempts to decode the signal to find a pattern represented by the signal. For example, in a speech recognition system, a speech signal (often referred to as a test signal) is received by the recognition system and is decoded to identify a string of words represented by the speech signal.
Input signals are typically corrupted by some form of noise. To improve the performance of the pattern recognition system, it is often desirable to estimate the noise in the noisy signal.
In the past, some frameworks have been used to estimate the noise in a signal. In one framework, batch algorithms are used that estimate the noise in each frame of the input signal independent of the noise found in other frames in the signal. The individual noise estimates are then averaged together to form a consensus noise value for all of the frames. In a second framework, a recursive algorithm is used that estimates the noise in the current frame based on noise estimates for one or more previous or successive frames. Such recursive techniques allow for the noise to change slowly over time.
In one recursive technique, a noisy signal is assumed to be a non-linear function of a clean signal and a noise signal. To aid in computation, this non-linear function is often approximated by a truncated Taylor series expansion, which is calculated about some expansion point. In general, the Taylor series expansion provides its best estimates of the function at the expansion point. Thus, the Taylor series approximation is only as good as the selection of the expansion point. Under the prior art, however, the expansion point for the Taylor series was not optimized for each frame. As a result, the noise estimate produced by the recursive algorithms has been less than ideal.
Maximum-likelihood (ML) and maximum a posteriori (MAP) techniques have been used for sequential point estimation of nonstationary noise using an iteratively linearized nonlinear model for the acoustic environment. Generally, using a simple Gaussian model for the distribution of noise, the MAP estimate provided a better quality of the noise estimate. However, in the MAP technique, the mean and variance parameters associated with the Gaussian noise prior are fixed from a segment of each speech-free test utterance. For nonstationary noise, this approximation may not properly reflect realistic noise prior statistics.
In light of this, a noise estimation technique is needed that is more effective at estimating noise in pattern signals.