A computer speech dictation system that would allow a speaker to efficiently dictate and would allow the dictation to be automatically recognized has been a long-sought goal by developers of computer speech systems. The benefits that would result from such a computer speech recognition (CSR) system are substantial. For example, rather than typing a document into a computer system, a person could simply speak the words of the document, and the CSR system would recognize the words and store the letters of each word as if the words had been typed. Since people generally can speak faster than type, efficiency would be improved. Also, people would no longer need to learn how to type. Computers could also be used in many applications where their use is currently impracticable because a person's hands are occupied with tasks other than typing.
Typical CSR systems have a recognition component and a dictation editing component. The recognition component controls the receiving of the series of utterances from a speaker, recognizing each utterance, and sending a recognized word for each utterance to the dictation editing component. The dictation editing component displays the recognized words and allows a user to correct words that were misrecognized. For example, the dictation editing component would allow a user to replace a word that was misrecognized by either speaking the word again or typing the correct word.
The recognition component typically contains a model of an utterance for each word in its vocabulary. When the recognition component receives a spoken utterance, the recognition component compares that spoken utterance to the modeled utterance of each word in its vocabulary in an attempt to find the modeled utterance that most closely matches the spoken utterance. Typical recognition components calculate a probability that each modeled utterance matches the spoken utterance. Such recognition components send to the dictation editing component a list of the words with the highest probabilities of matching the spoken utterance, referred to as the recognized word list.
The dictation editing component generally selects the word from the recognized word list with the highest probability as the recognized word corresponding to the spoken utterance. The dictation editing component then displays that word. If, however, the displayed word is a misrecognition of the spoken utterance, then the dictation editing component allows the speaker to correct the misrecognized word. When the speaker indicates to correct the misrecognized word, the dictation editing component displays a correction window that contains the words in the recognized word list. In the event that one of the words in the list is the correct word, the speaker can just click on that word to effect the correction. If, however, the correct word is not in the list, the speaker would either speak or type the correct word.
Some CSR systems serve as a dictation facility for word processors. Such a CSR system controls the receiving and recognizing of a spoken utterance and then sends each character corresponding to the recognized word to the word processor. Such configurations have a disadvantage in that when a speaker attempts to correct a word that was previously spoken, the word processor does not have access to the recognized word list and thus cannot display those words to facilitate correction.