Automatic Speech Recognition (ASR) systems convert spoken audio into text. Recognition accuracy for a particular utterance can vary based on many factors including the audio fidelity of the recorded speech, correctness of the speaker's pronunciation, and the like. These factors contribute to continuously varying levels of recognition accuracy which can result in several possible transcriptions for a particular utterance.
Some ASR systems are able to indicate transcription performance confidence in the transcription. In addition, some ASR systems are able to return multiple transcription options for a particular utterance, or fragment of an utterance, each with its own performance confidence. Some approaches for accomplishing this are described in U.S. Provisional Patent Application Nos. 60/957,386, 60/957,701 and 61/021,341.
Generally, an application that is displaying speech results might only display the results with the highest confidence values. However, in some cases, it may be useful to also make the other transcription options available to the user so that they can easily correct transcription errors by choosing from amongst all of the transcription options. If the display device has enough space, all of the results can be listed for the user, so that they can evaluate and choose the correct or most correct result. However, if the display device is small, there may only be room to display the highest confidence results, and the user may have to navigate through a user interface to see and select other result options. If this is the case, the user experience can become quite tedious, especially if there are numerous recognition options available for different parts of the utterance.
This disclosure describes an approach whereby multiple transcription result options can be exposed to the user at once, even on a small display device, by use of visual animation techniques.