1. Field of the Invention
The present invention relates generally to speech recognition systems and, more particularly, to a system and method for latency reduction for automatic speech recognition using partial multi-pass results.
2. Introduction
Automatic speech recognition (ASR) is a valuable tool that enables spoken audio to be automatically converted into textual output. The elimination of manual transcription represents a huge user benefit. Thus, whether applied to the generation of transcribed text, the interpretation of voice commands, or any other time-saving application, ASR is presumed to have immense utility.
In practice, however, ASR comes at a great computational cost. As computing technology has improved, so has the complexity of the computation models being applied to ASR. Computing capacity is rarely wasted in the ever continuing search for accuracy and speed in the recognition of speech.
These two criteria, accuracy and speed, in particular represent the thresholds by which user adoption and acceptance of the technology are governed. Quite simply, if the promise of the technology exceeds the practical benefit in real-world usage, the ASR technology quickly moves into the category of novelty, not usefulness.
Conventionally, high accuracy ASR of continuous spontaneous speech requires computations taking far more time than the duration of the speech. As a result, a long latency exists between the delivery of the speech and the availability of the final text transcript. What is needed therefore is a mechanism that accommodates real-world ASR latencies without sacrificing application usefulness.