A variety of automatic speech recognizers exist for transcribing speech. Such systems typically may be operated in a “verbatim transcript” mode, in which all of the words spoken are transcribed in the order in which they were spoken. It is not desirable, however, to produce a verbatim transcript when the speaker performs editing operations which invalidate previously-dictated speech.
Consider, for example, a speaker dictating into a handheld digital recorder. The speaker speaks a few sentences, then realizes that he has misspoken. He desires to re-record (replace) his previous ten seconds of speech, so he rewinds the recording by ten seconds (perhaps by pressing a rewind button on a recording device) and then begins speaking again to correct the previous ten seconds of speech.
A verbatim transcript of such speech would therefore include not only the speech which the speaker intended to become part of the final transcript, but also speech that has been replaced by other speech (e.g., the ten seconds of speech that was re-dictated), and which therefore should not become part of the final transcript. Although some existing speech recognizers are capable of producing a transcript that reflects such changes made to the spoken audio stream before the entire audio stream has been dictated, such systems do so by requiring that recognition of each portion of the audio stream be delayed for some period of time after that portion has been spoken, to ensure (or at least increase the likelihood) that the resulting transcript of that portion of the audio stream will not become invalidated by subsequent speech.