This invention relates generally to the field of computer systems and, more particularly to correcting a speech recognition mode error in a computer software program when the incorrect mode has been previously selected and speech input has been incorrectly input into the program.
Since the advent of the personal computer, human interaction with the computer has been primarily through the keyboard. Typically, when a user wants to input information or to enter a command into a computer, the information or the command is typed on a keyboard attached to the computer. Other input devices have supplemented the keyboard as an input device, including the mouse, touch-screen displays, the integrated pointer device, and scanners. Use of these other input devices have decreased the amount of user time spent in entering data or commands into the computer.
Computer-based voice recognition and speech recognition systems have also been used for data or command input into personal computers. Voice recognition and speech recognition systems convert human speech into a format that can understood by the computer. When a computer is equipped with a voice recognition or speech recognition system, data and command input can be performed by merely speaking the data or command to the computer. The speed at which the user can speak is typically faster than conventional data or command entry. Therefore, the inherent speed in disseminating data or commands through human speech is a sought after advantage of incorporating voice recognition and speech recognition systems into personal computers.
Throughout the remainder of this disclosure, the terms xe2x80x9cvoice recognitionxe2x80x9d and xe2x80x9cspeech recognitionxe2x80x9d will be used synonymously. In some instances, a distinction is made between voice recognition and speech recognition. However, both voice recognition and speech recognition systems suffer from the same problems described herein, and the same solutions have been applied to both recognition technologies to resolve the shortcomings of the prior art.
The increased efficiency of users operating personal computers equipped with speech recognition systems has encouraged the use of such systems in the workplace. Many workers in a variety of industries now utilize speech recognition systems for numerous applications. For example, computer software programs utilizing voice recognition and speech recognition technologies have been created by DRAGON, IBM, and LERNOUT and HAUSPIE. When a user reads a document aloud or dictates to a speech recognition program, the program can enter the user""s spoken words directly into a word processing program operating on a personal computer.
Generally, computer-based and speech recognition programs convert human speech into a series of digitized frequencies. These frequencies are matched against a previously stored set of words, or phonemes. When the computer determines correct matches for the series of frequencies, computer recognition of that portion of human speech is accomplished. The frequency matches are compiled until sufficient information is collected for the computer to react. The computer can then react to certain spoken words by storing the human speech in a memory device, transcribing the human speech into a document for a word processing program, or executing a command in a program module, such as an application program.
However, speech recognition systems are not 100% reliable. Even with hardware and software modifications, the most proficient speech recognition systems can attain approximately 97-99% reliability. Internal and external factors can affect the reliability of speech recognition systems. Factors dependent upon the recognition technology itself include the finite set of words or phonemes and the vocabulary of words to compare the speaker""s input to. Environmental factors such as regional accents, external noise, and the microphone can degrade the quality of the input, thus affecting the frequency of the user""s words and introducing potential error into the word or phoneme matching.
A speech recognition software program can be used to input commands or text into other application programs. For example, Kurzweil""s xe2x80x9cVOICEPROxe2x80x9d speech recognition software can be used to input text or commands into a document created by a word processing application program, such as MICROSOFT WORD. When a user chooses to use the speech recognition program to enter a command, the user manually selects the command mode in the speech recognition program. The user then speaks the command, such as xe2x80x9cdeletexe2x80x9d. The speech recognition program processes the command, and sends the xe2x80x9cdeletexe2x80x9d command to the word processing program for execution of the command. Most mode selection is done automatically, and the errors come from the machine getting the mode wrong rather than user error. The net effect is the same, though. If the user chooses to use the speech recognition program to enter text into a document, the user manually selects the dictation mode in the speech recognition program. The user then begins to speak the text to be input, such as xe2x80x9cwhere do you want to go todayxe2x80x9d. The speech recognition program processes the speech, and sends the processed speech to the word processing program to be input into the document. The user selection of a mode is necessary for the speech recognition software to correctly process the user""s speech input. Manual selection of the speech recognition mode before the user speaks is cumbersome and time consuming.
Occasionally, the user forgets to change the mode of the speech recognition program before speaking. For example, if the speech recognition program is in the command mode and the user says xe2x80x9ccopy machines make copies not coffeexe2x80x9d, the speech recognition program will process the speech input xe2x80x9ccopy machines make copies not coffeexe2x80x9d as a command. The speech input xe2x80x9ccopyxe2x80x9d will be executed by the application program, but the remaining speech may not be understood as a command, and the application program will not process the speech.
On other occasions, the speech recognition program will be in the dictation mode and the user will want the word processor to execute a command. If the user forgets to change the mode and says xe2x80x9ccopyxe2x80x9d, the speech recognition program will process the speech as dictation and the speech input will be entered as text into the application program.
Various solutions to the mode error problem have been attempted. The typical correction procedure involves the circumstance described above, when the user forgets to change the mode before speaking, resulting in a mode error. Sometimes, the mode error is compounded by the circumstance where the user does not realize he is in the wrong mode and the speech input is processed in the incorrect mode from the time the initial mode error was made. If the speech input has been incorrectly input as dictation, then the user can manually delete the dictation that has been input into the application program as text. The user continues the correction procedure by manually selecting the command mode before speaking again. If the speech input has been incorrectly input as a command, then the user can manually xe2x80x9cundoxe2x80x9d the executed command in the application program. The user continues the correction procedure by manually selecting the dictation mode before speaking again. The manual selection of the correct speech recognition mode and the manual correction of the xe2x80x9cundoxe2x80x9d or xe2x80x9cdeletexe2x80x9d commands can be cumbersome and time consuming.
Thus, there is a need in the art for a method that reduces user time in correcting speech recognition mode errors.
There is a further need in the art for a method that reduces the number of keystrokes or commands in correcting speech recognition mode errors.
The present invention meets the needs described above in a speech engine correction module for correcting speech recognition mode errors. The speech engine correction module can reduce user time in correcting speech recognition mode errors. Furthermore, the speech engine correction module can reduce the number of keystrokes and commands needed to correct a speech recognition mode error.
Generally described, the invention is a speech engine correction module having a speech recognition program and a speech engine. The speech recognition program is configured to receive speech for entry into a document for a program, such as a word processor. When the speech recognition program receives speech input, the program processes the speech input for recognition the speech engine. The speech recognition program then sends the speech input to the speech engine.
The speech engine receives the speech input from the speech recognition program, and further processes the speech input. A command processor and a dictation processor each process the speech input as a command and as dictation, respectively. The results from each processor can be stored in a memory device, such as RAM, for later retrieval.
The speech engine determines a speech recognition mode for the speech input using a mode selection processor. The mode selection processor uses criteria such as the context and the content of the speech input to determine a speech recognition mode for the speech input. After the mode selection processor selects a mode, the speech input is sent by the speech engine to the program for entry into the document as dictation or as a command.
A mode correction processor detects when a speech recognition mode error has been made. Typically, the user can send a command from the program to indicate that a speech recognition mode error has been made. When the mode correction processor receives a command indicating a speech recognition mode error, the mode correction processor initiates a correction routine corresponding to the type of speech recognition mode error.
When a command speech recognition error has been made, that is, when speech input has been incorrectly entered as a command, then a command to dictation routine is executed by the speech engine. The mode correction processor sends an xe2x80x9cUNDOxe2x80x9d command to the program to remove the entered command applied to the document. Next, the mode correction processor selects a candidate selection from the alternative dictation selections stored in RAM. The mode correction processor copies the alternative selections from RAM and sends the alternative selections to the program. The candidate selection is then entered in the program as a dictation into the document.
When a dictation error has been made, that is, when speech input has been incorrectly entered as dictation, then the dictation to command routine is executed by the speech engine. The mode correction processor sends a xe2x80x9cDELETExe2x80x9d command to remove the dictation input entered into the document. Next, the mode correction processor selects a candidate selection from the alternative selections stored in RAM. The mode correction processor processes the candidate selection and enters the candidate selection in the program as a command in the document.
According to an aspect of the invention, prior to processing the speech input with the dictation processor and the command processor, the mode selection processor can determine a speech recognition mode for the speech input. The speech input is processed by the selected mode processor, and the results are stored in the RAM. The mode selection processor stores the speech input in the RAM for later retrieval. When a speech recognition mode error is detected by the mode correction processor, the mode correction processor sends an xe2x80x9cUNDOxe2x80x9d command to remove a command, or executes a xe2x80x9cDELETExe2x80x9d command to remove dictation from the document. Then, the speech input is retrieved from RAM and, processed by the alternative mode processor to obtain results for correction of the mode error. The alternative results are then sent to the program to be entered into the document.
That the invention improves over the drawbacks of the prior art and accomplishes the advantages described above will become apparent from the following detailed description of the exemplary embodiments and the appended drawings and claims.