A proofreader in a speech recognition system plays both audio and text on a word-by-word basis to facilitate proofreading and correction of the text document. As the words are played, the user has the option to halt playback and modify or otherwise correct the portions of interest Many commercial products provide visual cues to the user in association with the text as the audio is being played back to assist the user to identify mistakes in the transcription. For example, U.S. Pat. No. 5,031,113 discloses a system that highlights a word when the associated audio is being played back. However, systems like this do not perform well in the case of an unrecognized word. In such a case, since only a single word is highlighted at a time, when there is an unrecognized word, nothing will be highlighted, which will cause a user to lose their place in the text.
Further, in most applications the alignment algorithms used to provide audio and text alignment have substantial limitations. For example, EasePublisher™ manufactured by Dolphin Audio Publishing of the United Kingdom, has synchronous audio/text playback. While they boast importing of audio into text “SYNC as you hear”, or “SYNC as you speak”, each of these functions requires the user to manually specify the synchronization points. Also, the LSM™ product manufactured by Sprex, Inc. of Seattle Wash., U.S.A., provides audio and plain text files to a server. This mechanism is undesirable since it is important to have an accurate text representation of the audio provided.