1. Field of the Invention
This invention relates to text editing methods and systems. More particularly, this invention relates to computerized methods and systems for editing text during the playback of audio recordings for transcription. The methods and systems are implemented in computer hardware and software. This invention is related to a application, application Ser. No. 09/166,364, filed on even date herewith, entitled xe2x80x9cSpeech-Recognition-Assisted Selective Suppression of Silent and Filled Speech Pauses During Playback of an Audio Recording,xe2x80x9d which is incorporated herein by reference date herewith, entitled xe2x80x9cA Method for Controlling Playback of Speech.xe2x80x9d
2. Background Information
Dictation and transcription of recorded speech are commonly used in a variety of professions, such as in the legal and medical fields. Transcription is typically done by human transcriptionists who listen to an audio recording of a dictation and type the recorded speech into a word processor. Because transcription is an expensive process, automation of the process is important to reduce costs.
Speech recognition systems have helped to reduce transcription costs. Automatic speech recognition is the process of using a computer to convert spoken language into written text. In speech recognition systems, speech is typically recorded and fed to a speech recognition unit. The speech recognition unit produces a draft of the transcription, and then a transcriptionist edits the draft to produce a final, quality transcription copy.
If a speech recognition system could perform perfect transcription, the output text would need little or no editing to appear as accurate transcribed text. However, even if the speech recognition system were nearly flawless, speech that is not meant to be part of the transcribed text, such as punctuation indicators, paragraph indicators, corrections or other instructions for a transcriptionist, may appear as the text output of the speech recognition system. Background speech, such as a conversation between the dictator and another person that is not meant to be recorded, may also become part of the transcribed speech. Therefore, even if a speech recognition system were nearly flawless, there would typically be errors in the transcribed text output. Speech recognition systems may also have trouble producing quality results if a speaker has a strong accent or speaks with poor grammar. In many situations, therefore, a transcriptionist is needed to edit the text resulting from a speech recognition system to produce quality transcribed text.
To enable editing of the text output from a speech recognition system it is essential that the transcriptionist have access to the original audio recording during the editing process. Some editing programs provide a way of aligning or identifying the spoken word with the written text during playback of the audio recording to facilitate the editing work of the transcriptionist. Typically, for example, the transcriptionist can activate playback of the original dictation and each word will be highlighted in some way as it is spoken. Whenever the transcriptionist sees an error, the transcriptionist may stop playback of the dictation, correct the error, and then resume playback of the dictation. Some custom editors may also be voice controlled so that the transcriptionist can edit the text without ever touching a keyboard.
A fundamental problem with this typical approach to editing the text resulting from a speech recognition system is that most text editing programs were not designed for this type of use. Most text editing programs were designed to allow the user to type rapidly and to fix mistakes as the user types them. There are many kinds of errors typically made by speech recognition units that are time-consuming and thus expensive to correct. For example, a simple mistake, such as a missed period, requires several keystrokes to repair. The transcriptionist must position the cursor at the end of the last word of the sentence and type a period. Next, an extra space needs to be added. Then cursor must be positioned at the first word of the next sentence, and the first letter of that word must be deleted and retyped as a capital letter. Thus, to fix a simple mistake such as a missed period requires a minimum of five keystrokes. In a program where the cursor is automatically aligned with the audio during playback, when the transcriptionist finds a mistake, he or she needs to stop playback of the audio recording, position the cursor at the point necessary for correction, perform each keystroke to fix the mistake, and then resume playback of the audio recording, which may now need to be rewound. This can be a slow process for fixing mistakes that makes playback-based editing of transcribed text expensive.
It should also be recognized that the transcriptionist is working in a complex environment. He or she may be viewing text on a monitor and simultaneously listening to an audio playback, both of which are continuously changing. To control these and perform editing, the transcriptionist may not only apply both hands to the computer keyboard but also may optionally use a foot control to start and stop and move forward in or rewind the audio recording. Achieving efficient use of these various inputs and controls is a non-trivial task.
A method and system is needed to improve the editing efficiency of text generated by speech recognition systems. More particularly, a system and method is needed in a playback-based text editing system where the text editor aligns or identifies the written word with the spoken word during playback to allow the transcriptionist to edit the draft with little or no stopping of the audio playback. The system and method ideally allows the editing process to take the same amount of time as playback of the audio recording without interruptions to stop and fix text.
A method for editing written text in a text editor which automatically aligns a cursor in the written text on a screen with a particular spoken word during playback of an audio recording. The method may comprise aligning the cursor in a targeted insertion point in response to a user""s input, performing one or more editing functions at the targeted insertion point, and realigning the cursor with the spoken words. The act of aligning may further comprise adjusting the cursor location by a reaction time variable. In another embodiment, the act of aligning may further comprise determining whether the targeted insertion point is an appropriate insertion point for one or more text edits, and adjusting the cursor location to an appropriate insertion point if the targeted insertion point is inappropriate.
In another embodiment of the invention, the method comprises identifying, in response to a user""s input, a targeted insertion point for one or more text edits, wherein the act of identifying comprises the act of adjusting the cursor location by a reaction time variable, performing one or more editing functions at the targeted insertion point, and realigning the cursor with the spoken words. In another embodiment, the method comprises performing one or more editing functions at a position defined in response to a user""s input, wherein a single keystroke by the user causes such editing functions to be executed starting at an insertion point identified by the cursor location of the text editor, wherein the single keystroke causes one or more edits normally requiring multiple keystrokes.
In another embodiment, the method comprises accepting a single keystroke command from a user to perform an editing function in the written text, wherein the single keystroke causes one or more edits normally requiring multiple keystrokes, and coordinating the editing function resulting from the keystroke with a targeted insertion point, wherein the targeted insertion point is identified by the location of the cursor as it automatically moves through the written text in coordination with playback of the audio recording.
Yet another embodiment of the invention is a method for editing written text in a text editor. The method comprises aligning a cursor in the written text at a targeted insertion point in response to a user""s input, determining whether the targeted insertion point is an appropriate insertion point for one or more text edits, adjusting the cursor location to an appropriate insertion point if the targeted insertion point is inappropriate, and performing one or more text edits at the appropriate insertion point.
Another embodiment of the invention is an apparatus for editing written text. The apparatus comprises a text editor that automatically aligns a cursor in the written text with a particular spoken word during playback of an audio recording, and software containing instructions to align the cursor in a targeted insertion point in response to a user""s input, perform one or more editing functions at the targeted insertion point, and realign the cursor with the spoken words. In yet another embodiment, the software of the apparatus further contains instructions to adjust the cursor location by a reaction time variable.