The invention relates to speech processing, such as speech recognition or speech coding, of a degraded speech signal.
Increasingly automatic speech recognition and coding systems are used. Although the performance of such systems is continuously improving, it is desired that the accuracy be increased further, particularly in adverse environments, such as having a low signal-to-noise ratio (SNR) or a low bandwidth signal. Normally, speech recognition systems compare a representation Y, such as an observation vector with LPC or cepstral components, of an input speech signal against a model Λx of reference signals, such as hidden Markov models (HMMs) built from representations X, such as reference vectors, of a training speech signal.
In practice a mismatch exists between the conditions under which the reference signals (and thus the models) were obtained and the input signal conditions. Such a mismatch may, in particular, exist in the SNR and/or the bandwidth of the signal. The reference signals are usually relatively clean (high SNR, high bandwidth), whereas the input signal during actual use is distorted (lower SNR, and/or lower bandwidth).
U.S. Pat. No. 5,727,124 describes a stochastic approach for reducing the mismatch between the input signal and the reference model. The known method works by using a maximum-likelihood (ML) approach to reduce the mismatch between the input signal (observed utterance) and the original speech models during recognition of the utterance. The mismatch may be reduced in the following two ways:                A representation Y of the distorted input signal can be mapped to an estimate of an original representation X, so that the original models Λx which were derived from the original signal representations X can be used for recognition. This mapping operates in the feature space and can be described as Fv(Y), where v are parameters to be estimated.        The original models Λx can be mapped to transformed models Λy which better match the observed utterance Y. This mapping operates in the model space and can be described as Gη(Λx), where η represents parameters to be estimated. The parameters v and/or η are estimated using the expectation maximization algorithm to iteratively improve the likelihood of the observed speech Y given the models Λx. The stochastic matching algorithm operates only on the given test utterance and the given set of speech models. No training is required for the estimation of the mismatch prior to the actual testing. The mappings described in U.S. Pat. No. 5,727,124 are hereby included by reference.        
Both methods may also be combined, where the representation Y of the distorted input signal is mapped to an estimate of an original representation X and the original models Λx are mapped to transformed models which better match the estimated representation X. The methods may be used in an iterative manner where the transformed signal and/or the transformed models replace the respective original input signal and/or models. In this way the input signal and models are iteratively transformed to obtain a statistical closer match between the input signal and the models. In this process a relatively noisy input signal may get transformed to a cleaner input signal, whereas relatively clean models might get transformed to more noisy models.
For recognition, models are usually trained under the best (clean) conditions in order to obtain optimal recognition. In the known method, the models are transformed based on the distorted input signal. This degrades the performance, particularly for low SNR ratios, making it difficult to obtain the optimal performance which could be achieved with the original models. Moreover, if the mismatch between the original models and the input signal is significant, the risk of transforming the signal and/or models in a wrong direction increases (albeit that they statistically may come closer). This is for instance the case if the input signal has a low signal to noise ratio, making it difficult to reliably estimate the original signal.