The present invention deals with speech recognition engines. More specifically, the present invention deals with providing a user with alternatives to the speech recognition output provided by the engine.
Speech recognition engines receive speech data indicative of words spoken by a user. The speech data is provided to a decoder for recognition. The decoder accesses a plurality of models, such as acoustic models and language models, and identifies a word, or a sequence of words, from the speech data input to the engine.
However, even the most advanced real-time continuous speech recognition engines currently available can not correctly determine 100% of what a speaker has uttered. Therefore, it is not uncommon for a user to wish to change or correct a recognition result provided by an engine. Instead of forcing the speaker to re-utter the mis-recognized phrase, it is often more efficient to provide the speaker with additional possible interpretations of the utterance, since there is a reasonable chance that what the user actually said was one of the alternate interpretations provided by the decoder.
In the past, when the user highlighted recognized text to be corrected (or otherwise indicated a portion of the recognized text which needed to be changed), the engine provided a number of alternate suggestions to the user. However, these alternate suggestions were not always the best suggestions, based upon scores generated by the speech recognition engine. Instead, the alternatives provided were often simply a fixed, predetermined number, of alternatives that the engine was required to provide to the application. Such alternatives did not utilize information that the engine held regarding the acoustic confidence and probabilities of word combinations.
Similarly, in prior engines, if a document was dictated, for example, by the user, and was saved for later editing, it was very cumbersome for all of the alternatives to be maintained such that they could be displayed to the user at a later time. Further, in order to present the best alternatives to the user, it is necessary for scores to be computed which correspond to the alternatives. Therefore, it would have been very cumbersome with prior systems to attempt to calculate scores for all possible alternatives, such that those alternatives and scores could be saved, and so that the user could retrieve them during a later editing process. In fact, the complexity of this problem, as measured by the number of potential candidates for the N-Best alternatives, increases exponentially as the duration of time for which alternates are computed increases.