Speech recognition systems use a “phonetic vocabulary” that contains pronunciations of all the words that may be spoken by a speaker. Spoken words are matched against their recognized equivalent, and then provided to the speaker. Two performance criteria for speech recognition systems are speed and accuracy of recognition. Various refinements have been devised to improve these two performance criteria. The performance of a general purpose speech recognition system can be improved by adapting the system to a particular speaker. Many such refinements can be classified as either of two general types of adaptation mechanism.
The first kind of adaptation mechanism involves adapting acoustic models of speech used in the speech recognition system, and the second kind of adaptation mechanism involves adapting the vocabulary used by the speech recognition system.
Acoustic model adaptation (see Chin-Hui Lee, Chih-Heng Lin, Biing-Hwang Juang, “A Study on the speaker adaptation of the parameters of continuous density Hidden Markov Models,” IEEE Transaction on Signal Processing, Vol. 39, No. 4, April 1991) is generally used to improve recognition accuracy for a particular speaker, or in a particular environment. Acoustic model adaptation may be used in, for example, noisy environments, telephony environments, and office environments.
Vocabulary adaptation, by contrast, may be used in the context of performing particular task speakers (see A. Sankar, A. Kannan, B. Shahshahani, E. Jackson, “Task-specific Adaptation of Speech Recognition Models,” Proceedings of Automatic Speech Recognition and Understanding, ASRU, 2001). Particular vocabularies that are likely to be used are commonly context-dependent. A specific vocabulary is implied when the speaker is, for example, dictating technical correspondence, or performing certain command and control tasks.
There have been approaches wherein an adaptation of the vocabulary is achieved through changing the pronunciation networks (Kyung-Tak Lee, Lynette Melnar, Jim Talley, “Symbolic Speaker Adaptation for Pronunciation Modeling,” in ISCA Tutorial and Research Workshop on Pronunciation Modeling and Lexicon Adaptation for Spoken Language, Estes Park, Colo. USA, Sep. 14-15, 2002). This uses a pronunciation network to “generate” all the pronunciations of the words. Such a technique cannot work on choosing from the existing pronunciations that could have been created manually earlier.
These three types of adaptation mechanisms are responsible for improvements in the three above-mentioned performance criteria. Further advances in these performance criteria are welcome, and thus a need clearly exists for improved speech recognition techniques.