1. Technical Field
This invention relates to the field of speech recognition, and more particularly, to the use of multiple cursors for dictation and correction within a speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of text words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. Improvements to speech recognition systems provide an important way to enhance user productivity.
When using a speech recognition system, a typical dictation function can include dictating to the speech recognition system and subsequently correcting any speech recognition errors. This process is often cyclical in nature in that a user will often dictate part of a body of text and correct that part of text before dictating additional text. For example, a user can dictate several paragraphs of a document. Before continuing to dictate the remainder of the document, the user can correct those dictated paragraphs.
Conventional methods of correcting speech recognition errors involve the user reading recognized text in an effort to proofread, or visually check, the accuracy of the recognized text. When an incorrect word is found by the user, the user can select that word using a command initiated by voice, a pointer, or one or more keystrokes. Additionally, the user can initiate a correction function to correct the selected incorrect word. For example, if the user says “correct <picture>”, the speech recognition system can select the first occurrence of the word “picture” beginning from the location of an insertion cursor within the body of text. The insertion cursor, as is known in the art, can be represented with an “I” or an I-beam type character. A common example of an insertion cursor can be the cursor within a word processing application program. The insertion cursor can denote the location where new text will be inserted within a document, or body of text, when typed or recognized from a user spoken utterance.
When the user specified word is selected, in this case the word “picture”, the insertion cursor of an application program such as a word processor can be relocated to the point of correction. The point of correction in this case can be the location of the selected word within the body of text. In this manner, when a correct or alternate word is selected by the user during the correction function, that word can be substituted for the selected incorrect word within the body of text. After insertion of the alternate word within the body of text, the insertion cursor can be located immediately after the newly inserted word. If the user initiates a second correction function immediately following the first, the speech recognition system can search for the second incorrect word starting at the location of insertion cursor. In that case, the entire dictated body of text need not be searched and the correction function can track the user's proofreading process.
In most cases, however, a user's intent is to continue dictation from the end of a body of text in order to complete the document in progress. To relocate the insertion cursor so that dictation can continue, the user must issue one or more speech commands or pointer initiated commands to relocate the insertion cursor to the end of the body of text.
Conventional methods of correcting speech recognition errors can have disadvantages. One such disadvantage is that, as a practical matter, after a user completes a speech correction function, the user's instinct is to resume dictation without first relocating the insertion point to the end of the body of text where additional text is typically added. As a result, the user's dictation can be inserted into the document in an incorrect or undesired location. To remove the misplaced text, the user must discontinue dictation, remove the misplaced text, relocate the insertion cursor to a desired location, and then continue dictation. Such hindrances can discourage users from using the correction functions within speech recognition systems. Moreover, non-use of the correction function to correct speech recognition errors can affect speech recognition system performance as the system will be unable to learn from past errors. Consequently, the performance of speech recognition systems can suffer.
One method of dealing with this problem has been to force the insertion cursor to be relocated at the end of a body of text after each correction. This proposed solution, however, has disadvantages. One such disadvantage relates to the manner in which speech recognition systems search for user specified words to correct. In operation, when a correction function is initiated, the speech recognition system can begin searching for a user specified word from the location of the insertion cursor toward the end of the body of text. Thus, when the insertion cursor is relocated to the end of a body of text, the next initiation of the correction function typically defaults to searching for the user specified word from the beginning of the body of text. In cases where there are multiple occurrences of a word, the speech recognition system will select the first occurrence of the user specified word, rather than continuing from the last corrected word. The user must then command the speech recognition system to continue to the next occurrence until the user desired word is selected. Notably, this method does not track the natural flow of a user proofreading a document and forces a user to correct previously corrected portions of text.
Another proposed solution has been to allow the user to place the insertion cursor at the point within the body of text where the user most recently initiated the correction function. In addition to requiring an extra step, however, this approach requires the user to recall where in the document the user stopped proofreading, as well as the location of the last word corrected using the correction function. Moreover, requiring the user to manually relocate the insertion cursor can be contrary to the user's instinct to begin dictation upon completion of the correction function.