1. Technical Field
This invention relates to the field of computer voice recognition systems for dictation and more particularly to a system for fast voice correction of error words in dictated text.
2. Description of the Related Art
Computer voice recognition dictation systems are designed to allow user data to be entered in a target application program by means of spoken words (e.g. dictation of a report in a word processing application program). Such programs automatically recognize a spoken utterance as a word comprised of a series of characters, and display the word as part of a set of dictated text on a user interface screen. These types of programs are particularly helpful for persons who either lack the physical ability to operate a keyboard and/or computer mouse, or who find it more convenient to dictate documents rather than type them.
One basic problem with computer voice recognition systems for dictation relates to correction of any errors which may occur in the dictated text. Since each word which is entered into a dictated document is generated by a software voice recognition engine, typical errors can include words which are misunderstood by the recognition engine, words which are not contained in the vocabulary of the voice recognition engine, homonyms, or user errors. Correction of such errors with conventional keyboard/mouse controlled document editing systems is typically accomplished by moving a cursor to each location where an error is detected and them making the appropriate correction using the keyboard or mouse controls. This is a relatively simple task in the case of a keyboard/mouse controlled system because the user receives dynamic information regarding the position of the cursor as the mouse or cursor position keys are manipulated. More particularly, the user can visually detect the location of an error word on a computer screen which displays dictated text, and receives direct visual feedback as the cursor is repositioned by manipulation of keyboard and mouse controls. In this way, the user can immediately and accurately position the cursor at the location of the word to be edited.
By comparison, positioning a cursor at the location of an error word using voice commands only, has been a rather tedious task in voice controlled systems of the prior art. Typically, the design of prior art systems has been such that the user can select a word for correction only by using a finite set of specifically recognizable spoken utterances, or voice commands, which cause a cursor to move a predetermined number of increments in a predetermined direction. Usually these voice commands will move the cursor a few words or lines at a time, until it is positioned at the error word to be corrected. This approach is tedious and time consuming because it often requires a user to count the number of words or lines between the present location of the cursor and the error word. Further, it is common that a user will miscount or incorrectly estimate the number of words (or lines) between the cursor and the error words. Such errors commonly result in the need for additional cursor movement commands to place the cursor at the exact position required, and can also result in overshoot and backtracking.
Also, the need to constantly move the cursor using voice commands is distracting to the natural process of proofreading a document. Typically, a reviewer who is proof-reading a document will begin reading the document at a certain point (typically at its beginning) and continue reading to the end. Mistakes or errors are corrected as the erroneous word is encountered in the course of reading through the document. If a user must constantly stop reviewing the document in order to move the cursor to a desired location, the process becomes disjointed and proceeds more slowly. Also, words and sentences may be unintentionally skipped with the result that certain error words may not be detected.
Some voice type computer dictation systems permit correction of the last few dictated words upon recognizing certain voice commands from the user. For example, in some systems, the user can articulate a special command word, such as "Oops" which causes the system to present to the user on the display screen a correction dialog box. The correction dialog box contains the last few dictated words which are displayed separately from the body of the dictated text. Each word in the correction dialog is displayed in a column adjacent to a title word, such as "Word 1", "Word 2", etc. The user then selects the word to be corrected by saying the title word corresponding to the word which requires correction. Finally, any necessary correction can be performed. Obviously, this approach is of limited usefulness as it only permits the last few dictated words to be corrected, and it requires multiple spoken commands.
Finally, while it is theoretically possible to allow a user to identify a word to be corrected in a document by simply speaking the word, such an approach has been found to have a serious problem. In particular, multiple occurrences of the same word within a document, the existence of homonyms, and similar sounding words, all result in practical difficulties in implementing such an approach.
Currently available speech recognition engines and text processors are often unable to consistently discern which word in a document a user is referring when such word is spoken or articulated by a user for correction purposes. Attempting to identify error words in a document comprised of a substantial amount of dictated text inevitably leads to ambiguity and errors with respect to the word that is to be corrected. Once again, this results in an editing process which is both time consuming, tedious and frustrating for a user.
Thus, it would be desirable to provide a method and apparatus for rapid hands-free selection of error words in dictated text. It would also be desirable to provide a method for facilitating accurate hands-free proof reading and correction of a dictated document on a computer. It would further be desirable to provide a method and apparatus by which a computer user of automatic voice dictation software can unambiguously identify a word to be corrected in a document, by simply speaking the word which is to be edited.