Noise robustness methods for Automatic Speech Recognition (ASR) are historically carried out either in the signal domain or in the model domain. Referring to FIG. 1, signal domain methods basically try to “clean-up” the incoming signal 100 from the corrupting noise. In particular, a noise removal module 102 removes noise in accordance with noise estimates produced by noise estimation module 104. Then extracted features obtained from the adjusted signal by feature extraction module 106 are pattern matched to acoustic models 108 by pattern matching module 110 to obtain recognition 112. Turning to FIG. 2, model domain methods try to improve the performance of pattern matching by modifying the acoustic models so that they are adapted to the current noise level, while leaving the input signal 200 unchanged. In particular, a noise estimation module 202 estimates noise in the input signal 200, and model compensation module 204 adjusts the acoustic models 206 based on these noise estimates. Then, extracted features obtained from the unmodified input signal 200 by feature extraction module 208 are pattern matched to the adjusted acoustic models 206 by pattern matching module 210 to achieve recognition 212.
Noise robustness algorithms are a key for successful deployment of ASR technology in real applications and a vibrant sector of the ASR research community. However the noise robustness methods available today still have limitations. For instance, model-based methods clearly outperform signal-based methods, but may require clean speech databases for the training of the acoustic models. As for signal-based methods, while they under perform model-based methods, they have the advantage that they can be used with acoustic models that are trained in noisy conditions. This advantage is important as sometimes clean training data is not available for certain tasks, and also noisy training data recorded specifically for a certain task is the best way to obtain good task-specific acoustic models.
What is needed is a way to obtain the advantages of signal based methods, plus the improved performance of model-based methods. The present invention fulfills this need.