This invention relates generally to the field of multi-source data processing systems and, more particularly, to a background audio recovery system for speech recognition systems/software.
Since the advent of the personal computer, human interaction with the computer has been primarily through the keyboard. Typically, when a user wants to input information or to enter a command into a computer, he types the information or the command on the keyboard attached to the computer. Other input devices that have supplemented the keyboard as an input device include the mouse, touch-screen displays, integrated pointer devices, and scanners. Use of these other input devices have decreased the amount of user time spent in entering data or commands into the computer.
Computer-based voice recognition and speech recognition systems have also been used for data or command input into personal computers. Speech recognition systems convert human speech into a format that can be understood by the computer. When a computer is equipped with a speech recognition system, data and command input can be performed by merely speaking the data or command to the computer. The speed at which the user can speak is typically faster than conventional data or command entry. Therefore, the inherent speed in disseminating data or commands through human speech is a sought after advantage of incorporating and speech recognition systems into personal computers.
The increased efficiency of users operating personal computers equipped with voice recognition and speech recognition systems has encouraged the use of such systems in the workplace. Many workers in a variety of industries now utilize voice recognition and speech recognition systems for numerous applications. For example, computer software programs utilizing voice recognition and speech recognition technologies have been created by DRAGON, IBM, and LERNOUT and HAUSPIE. When a user reads a document aloud or dictates to a voice recognition program, the program can enter the user""s spoken words directly into a word processing program operating on a personal computer.
Generally, computer-based voice recognition and speech recognition programs convert human speech into a series of digitized frequencies. These frequencies are matched against a previously stored set of words, or phonemes. When the computer determines correct matches for the series of frequencies, computer recognition of that portion of human speech is accomplished. The frequency matches are compiled until sufficient information is collected for the computer to react. The computer can then react to certain spoken words by storing the human speech in a memory device, transcribing the human speech into a document for a word processing program, or executing a command in an application program.
However, voice recognition and speech recognition systems are not 100% accurate. Even with hardware and software modifications, the most efficient voice recognition and speech recognition systems can attain approximately 97-99% accuracy. Internal and external factors can affect the reliability of voice recognition and speech recognition systems. Internal factors dependent upon the recognition technology include the comparison between the finite set of words/phonemes and the vocabulary of words of a speaker. External factors include the environment such as regional accents, external noise, and the type of microphone can degrade the quality of the input, thus affecting the frequency of the user""s words and introducing potential error into the word or phoneme matching.
Conventional speech recognition systems suffer from significant recognition error rates. Different solutions have been applied to increase the recognition rate and to decrease the number of recognition errors. One solution is to train the voice recognition or speech recognition program to recognize the frequencies for a specific human voice. In a speaker dependent speech recognition system, the system creates a voice profile that recognizes the pronunciation patterns unique to a specific human voice. Speech recognition systems that are not trained for a particular speaker are called speaker independent systems, and therefore are more prone to recognition errors due to regional accents or differences in pronunciation.
Another solution uses a method called discrete speech input. Discrete speech input requires the operator to speak relatively slowly, pausing between each word, before speaking the next word. The pausing of the operator gives the speech recognition system an opportunity to distinguish between the beginning and the end each operator""s word. Recognition systems relying upon discrete speech input are slow and cumbersome for users accustomed to speaking at a normal conversational speed.
An alternative solution involves a method based upon continuous speech input. Continuous speech input systems require the user to speak a limited set of words that have been previously stored in the system vocabulary. Therefore, the speech recognition system relies upon a limited vocabulary of words. Optimum use of these systems occurs when the system is utilized by users in an environment with a specific vocabulary. For example, continuous speech input systems have been implemented in the medical industry in specific fields such as radiology, orthopedics, internal medicine, emergency medicine, mental health, etc. However, continuous speech input systems are limited by their inherent deficiencies of vocabulary, which limits their ability to be used in other industries or work environments.
Natural speech input systems will ultimately reach the marketplace. These systems will not require the user to speak in any particular way for the computer to understand, but will be able to understand the difference between a user""s command to the computer and information to be entered into the computer.
Throughout the remainder of this disclosure, the terms xe2x80x9cvoice recognitionxe2x80x9d and xe2x80x9cspeech recognitionxe2x80x9d may be used interchangeably. In some instances, a distinction is made between voice recognition and speech recognition. However, both voice recognition and speech recognition systems suffer from some of the same reliability problems described above, and the same solutions have been applied to both recognition technologies to resolve the shortcomings of the prior art.
Many multi-source data processing systems include voice recognition software. As described above, conventional voice and speech recognition software has many drawbacks. One major drawback is that an application program employing the voice or speech recognition software, such as a word processing program, frequently loses or does not properly capture dictation generated by a user.
There are two major reasons for not properly capturing dictation: One of the major reasons for this lost dictation is that users frequently forget to activate the speech recognition software because the microphone status indicators or icons are difficult to locate on a display device. Another reason why dictation is not properly capture is that, frequently, users assume that the microphone of the speech recognition software was turned on and start to dictate their thoughts. However, after a few minutes, the users discover that their voice commands and/or dictation were not recorded or properly processed by the speech recognition software. In such situations, users have to xe2x80x9cturn-onxe2x80x9d or xe2x80x9cwake-upxe2x80x9d the speech recognition software and re-dictate their thoughts.
Another cause of lost dictation is that the computers supporting the speech recognition software often have very slow processing speeds. Speech recognition software typically requires increased processing power relative to everyday applications, and many conventional computers do not sufficiently meet the needs of speech recognition software. In conventional computers, users may often utter a command and assume the command was properly captured by the computer. Then, the user proceeds directly to dictation. If the software did not capture the xe2x80x9cturn onxe2x80x9d command, then, any of the utterances made by the user would not be captured. In such cases, users must re-dictate their utterances so that this information will be captured by the computer.
Some of the conventional speech recognition software has attempted to solve these problems by providing a more visible microphone status indicator or icon. However, this quick fix or simple solution does not completely solve the aforementioned problems. Although a more visible microphone indicator or icon may reduce the likelihood of users inadvertently dictating without the speech recognition software being activated, many users will still not notice or observe the microphone status indicator or icon.
For example, many users dictate while looking at written materials such as notes or books on their desk and thus, such users do not look at the display device. For these users, a more visible microphone status indicator or icon will not alleviate the problem of lost dictation. Also, even with the increased size of the microphone status icon or indicator, users of speech recognition software still must wait a significant amount of time for the speech recognition software to become activated or xe2x80x9cturned onxe2x80x9d because of the slow processing speeds of conventional computers.
Other problems of prior art voice recognition software include mistakes in the processing of speech where the speech recognition software inadvertently replaces spoken words with words that are phonetically similar. For example, the word xe2x80x9capparentlyxe2x80x9d may be interpreted by voice recognition software as the phrase xe2x80x9ca parent.xe2x80x9d
Accordingly, there is a general need in the art for a background audio recovery system for use with a computer system that records and processes dictated speech that is generated while the speech recognition software is assigned to an inactive state. There is a further need in the art for a background audio recovery system that replays the actual background audio generated by a user in order to provide enhanced editing capabilities for the processed speech. There is a further need for a background audio recovery system that permits a user to edit background audio prior to entry of background audio into an open document of an application program.
The present invention is generally directed to a background audio recovery system having a speech recognition program module that can record audio and then apply speech recognition techniques to the recorded background speech or audio that was received from a microphone when the speech recognition program module was inadvertently assigned to an inactive mode. This continuous recording of all background audio or speech received from a microphone while the speech recognition program module is assigned to an inactive mode prevents loss of dictation from a user.
As stated above, the background audio recovery system of the present invention continuously saves background speech or audio when the speech recognition program module is assigned to an inactive state. When a user realizes that the microphone for the speech recognition program module was not xe2x80x9cturned onxe2x80x9d or was designated to be inactive, the user then properly xe2x80x9cturns onxe2x80x9d the microphone by either a spoken command word or keyed entry. The speech recognition program module prompts the application program if background speech or audio has been saved prior to the xe2x80x9cturning onxe2x80x9d or activation of the speech recognition program module.
If background audio or speech has been saved, the background audio recovery system informs the user that background speech prior to the activation of the microphone (or the activation of the speech recognition module) has been saved and is available to be converted and inserted into the current open document of the application program. The user is given at least one of the following options: (1) process and convert the background audio or speech to text, and display the text after applying spoken commands in a separate user interface; (2) process and convert the background audio or speech to text, and display the text with the spoken commands listed as text in a separate user interface; (3) process and convert the background audio or speech to text, and insert the text into the current open document without any editing; or (4) delete the background audio or speech.
If the user decides to process and convert the background speech to text, the background audio recovery system will convert the background speech to text with the speech recognition program module. The background audio recovery system will then display the converted background speech or text to the user via a user interface typically in the format of a separate dialog box or window before the text is inserted into the current open document of the application program or word processing system.
The background audio recovery system further prompts the user with additional editing options for the converted text. For one embodiment of the present invention, all the background speech is treated as text even if spoken commands were issued during the generation of the background speech. In another embodiment of the present invention, spoken or dictated commands are processed as commands and the user can determine whether each command is to be executed on corresponding background speech that is converted to text.
In a further embodiment, actual audio or speech received by the microphone is also saved in a memory device in a low fidelity format so that a user can play back actual audio to enhance the editing process of the converted text.
The present invention gives the user more control over the retrieval of xe2x80x9clostxe2x80x9d dictation to be inserted into an open document of a word processing system. Such control exists when commands and associated converted background speech are displayed in a separate dialog box before the converted background speech is inserted into the current open document of the word processing system. In other words, the present invention does not force the user to simply insert or xe2x80x9cdumpxe2x80x9d the contents of the converted background speech into an open document.
The present invention permits a user to xe2x80x9cturn onxe2x80x9d the microphone xe2x80x9cretroactivelyxe2x80x9d and provide a display of options with how to insert the converted background speech into the open document. The present invention also allows the user to set the bounds for the background speech processing where the user designates the amount of time or memory that should be utilized to prevent dictation losses. Further, the present invention also permits the user to xe2x80x9cturn onxe2x80x9d the microphone xe2x80x9cproactivelyxe2x80x9d where the recorded background speech or audio is discarded and the word processing system is ready to receive the users"" forthcoming speech.
More specifically described, the present invention is a background audio recovery system that includes an application program, such as a word processing program. The background audio recovery system displays an inactive status indicator for a speech recognition program module in an application program on a display device. The background audio recovery system then determines whether an audio input device is receiving audio input, such as speech or voice from a user. If audio is being received by the audio input device (i.e., a microphone), the background audio recovery system stores the audio data in a memory device. Alternatively, the background audio recovery system can convert the speech into text prior to saving to a memory device.
The background audio recovery system determines whether a command for activating the speech recognition program module has been issued and, if so, the background audio recovery system initiates a background audio program module for manipulating the stored audio.
According to an aspect of the present invention, the background audio recovery system stores background audio in cache memory of a central processing unit. According to another aspect of the present invention, the application program is a word processing program that is designed to manipulate stored data.
The background audio recovery system can determine whether a command for activating the application program has been issued by detecting the command from either a keyboard interface or an audio input device such as a microphone. According to yet another aspect of the present invention, the background audio recovery system can display a graphical user interface, such as a dialog box, on a display device. The background audio recovery system then can display a list of options for stored background audio within this graphical user interface.
The background audio recovery system can convert speech within the background audio to text data and then display the text data on a display device where the text data includes textual application program commands that have not been applied to the text data. In another aspect of the present invention, the background audio recovery system can apply the spoken commands to the other stored text data and then display the processed text data on a display device.
The background audio recovery system can also convert speech within the background audio into text data and insert the converted text data into an open file being accessed by the application program. The text data can include textual application program commands that have not been applied to the text data.
The background audio recovery system can also prompt a user to delete the stored audio data from a memory device. According to yet another aspect of the present invention, the background audio recovery system can store the background audio as a sound file in a memory device. The background audio recovery system then can convert speech within the background audio to text data and then display the converted text data on a display device while replaying the background audio from the sound file of the memory device. The background audio recovery system permits the user to denote at least one of a time increment, file size increment, and value in order to allocate a predefined size for the audio files containing recorded speech.
That the present invention improves over the drawbacks of the prior speech recognition software and accomplishes the advantages described above will become apparent from the following detailed description of the exemplary embodiments and the impended drawings and claims.