The present invention relates to a system and method for generating a user interface for a speech recognition program module.
A speech recognition program module offers a computer user substantial savings in time by allowing the user to dictate text directly into a document created using a word processor program module loaded on the computer. Even with increased processor speeds of modern personal computers, a speech recognition program module still requires processing time in order to recognize spoken words and to translate them into text for display in a word processor document. As a result of the time delay resulting from the recognition and translation process, the user should receive feedback confirmation that the speech was recorded by the speech recognition program module and is being translated.
Some existing speech recognition program modules fail to give the user any confirmation that the speech recognition program module has recorded the speech and is translating the speech. Such speech recognition program modules often do not display anything until translation of the speech into the final text is completed and displayed as the final text at the insertion point in the document. Such lack of feedback during the translation delay often causes users to dictate in short snippets in order to verify that the speech has been recorded and properly translated. Dictating in short snippets creates two additional problems. First, the time delay between the user""s dictation and the user""s verification increases the overall time required to dictate a document. Second, dictating in short snippets is not the best method of dictating using speech recognition program modules. A longer string of speech gives the speech recognition module the best chance of accurate translation because of the context offered by a longer string of speech as opposed to a short snippet.
Other speech recognition program modules provide feedback to the user during the translation delay by displaying the first approximation or first hypothesis of the translated text as soon as the hypothesis is available. The first hypothesis is displayed either directly in the document or in a floating window within the document. Displaying the hypothesis directly in the document or in a floating window adjacent the document can be distracting to the user. As translation continues and the hypothesis is replaced with updated translated text, the display must be updated to reflect those changes in the displayed text. This constant changing or flashing of text in the user interface can be distracting. The user is likely to be distracted by the hypothesis text which may be wrong and stop dictating to address the perceived error. The floating window further creates the additional problem of dividing the user""s attention between the text of the document at the insertion point and the flashing floating window text.
In order to address the problem of speech level, some speech recognition program modules display a volume level meter similar to those found on some stereo receivers. The level meter allows the user to verify that the microphone is actually picking up his or her speech. Such a separate level meter requires the user to divide his or her attention between the text of the document and the separate level meter in the graphical user interface.
Consequently, prior art speech recognition program modules often provide either too little user feedback or overload the user with feedback that is distracting or requires the user to divide his or her attention during the course of dictating.
Consequently, there is a need in the art for a system and method for generating a user interface for a speech recognition program module which offers feedback to the user with respect to the progress of the speech translation into text.
The present invention addresses above identified problems by providing a system and method for generating a user interface that confirms to the user that the speech recognition program module has recorded the speech and that the speech is being translated. Particularly, the method of the present invention inserts a place mark or bar within the document text at the insertion point. The place mark or bar its proportional in length to the expected length of the text that the user has dictated. The expected length of the place mark is based on the elapsed time of the speech string dictated by the user. As the speech recognition program module completes its translation of the speech, the final version of the translated text replaces the place mark or bar, beginning at the insertion point. In one embodiment of the present invention, the place mark is a bar consisting of a string of predetermined characters.
In addition, the place mark (the character string) can be highlighted, and the color of the highlighting can be used to indicate to the user the volume level of the recorded speech. Instead of highlighting, the characters in the character string can be colored to indicate to the user the volume level of the recorded speech.
The various aspects of the present invention may be more clearly understood and appreciated from a review of the following detailed description of the disclosed embodiments and by reference to the drawings and claims.