Speech recognition systems enable a computer system to understand at least selected portions of speech that are input to the computer system. In general, speech recognition systems parse input speech into workable segments that may be readily recognized. For example, input speech may be parsed into phonemes that are further processed to recognize the content of the speech. Typically, speech recognition systems recognize words in input speech by comparing the pronunciation of the word in the input speech with patterns or templates that are stored by the speech recognition system. The templates are produced using phonetic representations of the word and context-dependent templates for the phonemes. Many speech recognition systems include dictionaries that specify the pronunciations of terms that are recognized by the speech recognition system.
One place in which speech recognition systems are used is in dictation systems. Dictation systems convert input speech into text. In such dictation systems, the speech recognition systems are used to identify words in the input speech, and the dictation systems produce textual output corresponding to the identified words. Unfortunately, these dictation systems often experience a high level of misrecognition of speech input from certain users. The speech recognition systems employed within such dictation systems have one or more pronunciations for each word, but the pronunciations of the words are static and represent the pronunciation that the speech recognition system expects to hear. If a user employs a different pronunciation for a word than that expected by the speech recognition system, the speech recognition system will often fail to recognize the user's input. This drawback can be especially vexing to a user when a term has multiple proper pronunciations and the user employs one of the pronunciations that is not covered by the dictionary of the speech recognition system.
Another limitation of such dictation systems is that they are either not extensible (i.e., a user may not add a new term to the dictionary) or they permit the addition of new terms but generate their own pronunciation of the new term without allowing the user to discover the pronunciation(s). Such systems may use letter-to-sound heuristics to guess at the pronunciation of a newly added term. Unfortunately, such heuristics do not yield correct results in many instance. Oftentimes, when a user adds a new term to extend the dictionary used in a dictation system, the user merely enters the new term without providing a pronunciation, and the speech recognition system generates a pronunciation for the new term. This new pronunciation may be incorrect or may not correspond with the user's anticipated pronunciation of the word. As a result, there is often a high degree of misrecognition relative to speech input that uses the newly added term or that includes the newly added term.