A technique of time-varying SNR dependent coding for increased communication channel robustness is described by A. Bernard, one of the inventors herein, and A. Alwan in “Joint channel decoding—Viterbi Recognition for Wireless Applications”, in Proceedings of Eurospeech, Sebt. 2001, vol. 4, pp. 2703-6; A. Bernard, X. Liu, R. Wesel and A. Alwan in “Speech Transmission Using Rate-Compatable Trellis codes and Embedded Source Coding,” IEEE Transactions on Communications, vol. 50, no. 2, pp 309-320, Feb. 2002.; A. Bernand and A. Alwan, “Source and Channel Coding for low bit rate distributed speech recognition systems”, IEEE Transactions on Speech and Audio Processing, Vol. 10, No. 8 , pp570-580, Nov. 2202; and A. Bernard in “Source and Channel Coding for Speech and Remote Speech Recognition,” Ph.D. thesis, University of California, Los Angeles, 2002.
For channel and acoustic robustness is described by X. Cui, A. Bernard, and A. Alwan in “A Noise-robust ASR back-end technique based on Weighted Viterbi Recognition,” in Proceedings of Eurospeech, September 2003, pp. 2169-72.
Speech recognizers compare the incoming speech to speech models such as Hidden Markov Models HMMs to identify or recognize speech. Typical speech recognizers combine the likelihoods of the recognition features of each speech frame with equal importance to provide the overall likelihood of observing the sequence of feature vectors. Typically robustness in speech recognition is dealt with either at the front end (by cleaning up the features) or at the back end (by adapting the acoustic model to the particular acoustic noise and channel environment).
Such classic recognizers fail to differentiate between the particular importance of each individual frame, which can significantly reduce recognition performance when the importance of each frame can be quantitatively estimated into a weighted recognition mechanism.