(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of speech recognition, and more particularly, to a method for enhancing discrimination between and among user dictation, user voice commands, and text.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to text by a computer. The recognized text may then be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. Speech dictation systems further offer users a hands free method of operating computer systems.
In regard to electronic document preparation, presently available speech dictation systems provide user voice commands enabling a user to select a portion of text in an electronic document. Such user voice commands typically employ a syntax such as xe2x80x9cSELECT  less than text greater than xe2x80x9d, where the user voice command xe2x80x9cSELECTxe2x80x9d signals that the text following the command should be selected or highlighted. After a portion of text has been selected, the user can perform any of a series of subsequent operations upon the selected text.
Thus, if a user says, xe2x80x9cSELECT how are youxe2x80x9d, the speech dictation system will search for the text phrase xe2x80x9chow are youxe2x80x9d within a body of text in the electronic document. Once located in the body of text, the phrase can be selected or highlighted. Subsequently, the user can perform an operation on the selected text such as a delete operation, a bold/italic/underline operation, or a correction operation. In further illustration, once the text xe2x80x9chow are youxe2x80x9d is highlighted, that user selected portion of text can be replaced with different text derived from a subsequent user utterance. In this manner, users can perform hands-free correction of an electronic document.
Presently, known implementations of the xe2x80x9cSELECTxe2x80x9d command, or other similar user voice commands for selecting text, suffer from several disadvantages. One such disadvantage is that there may be multiple occurrences of the phrase or word that the user would like to select within a body of text. For example, within a body of text, there are likely to be many occurrences of the word xe2x80x9cthexe2x80x9d. Thus, if the user says xe2x80x9cSELECT thexe2x80x9d, the speech dictation system may not be able to determine which occurrence of the word xe2x80x9cthexe2x80x9d the user would like to select.
In addressing this problem, conventional speech dictation systems rely upon a system of rules for determining which occurrence of the user desired word or phrase the user would like to select. For example, a speech dictation system can begin at the top of the active window and select the first occurrence of the word or phrase. However, if the user did not want to select the first occurrence of the word or phrase, a conventional speech dictation system can provide the user with the ability to select another occurrence of the word. In particular, some conventional speech dictation systems provide navigational voice commands such as xe2x80x9cNEXTxe2x80x9d or xe2x80x9cPREVIOUSxe2x80x9d.
By uttering the voice command xe2x80x9cNEXTxe2x80x9d the user instructs the speech dictation system to locate and select the next occurrence of the desired word or phrase. Similarly, the command xe2x80x9cPREVIOUSxe2x80x9d instructs the speech dictation system to locate and select the previous occurrence of the desired word or phrase. Although such conventional systems allow the user to navigate to the desired occurrence of a particular word or phrase, users must develop strategies for navigating to the desired occurrence. This can result in wasted time and user frustration, especially in cases where the user perceives the speech dictation system to be inaccurate or inefficient.
Another disadvantage of conventional text selection methods within conventional speech dictation systems is that when searching for the user specified word or phrase, such speech dictation systems typically search the entire portion of a body of text appearing on the user""s screen. Each word appearing on the user""s screen is activated within the speech dictation system grammar and appears to the speech dictation system as an equally likely candidate. Because the user desires only a single word or phrase, enabling and searching the entire portion of the body of text appearing on the user""s screen can be inefficient. Moreover, the technique can increase the likelihood that a misrecognition will occur.
Yet another disadvantage of conventional text selection methods within conventional speech dictation systems is that often it is not readily apparent to the speech dictation system whether a user has uttered a word during speech dictation or a voice command, for example a voice command that activates a drop-down menu. For instance, if a user utters the word xe2x80x9cFilexe2x80x9d, depending upon the circumstance, the user could either intend to activate the File menu in the menu bar or insert the word xe2x80x9cfilexe2x80x9d in the electronic document. Accordingly, it is not always apparent to the conventional speech dictation system whether a user utterance is a voice command or speech dictation.
Consequently, although presently available speech dictation systems offer methods of interacting with a computer to audibly command an application, to provide speech dictation in an electronic document and to select text within the electronic document, there remains a need for an improved method of discriminating between user voice commands, user dictations, text, and combinations thereof.
The invention disclosed herein provides a method and apparatus for discriminating between different occurrences of text in an electronic document and between an instance of a voice command and an instance of speech dictation through the utilization of an eye-tracking system in conjunction with a speech dictation system. The method and apparatus of the invention advantageously can include an eye-tracking system (ETS) for cooperative use with a speech dictation system in order to determine the focus of point of a user""s gaze during a speech dictation system. In particular, the cooperative use of the ETS with the speech dictation system can improve accuracy of the xe2x80x9cSELECTxe2x80x9d user voice command functionality, or any other user voice command for selecting a portion of text within a body of text in a speech dictation system. The use of the ETS in the invention also can improve system performance by facilitating discrimination between user dictation and a voice command.
In accordance with the inventive arrangements, a method for searching for matching text in an electronic document can include identifying a focus point in a user interface and defining a surrounding region about the focus point. Notably, the surrounding region can include a body of text within a user interface object configured to receive speech dictated text. Additionally, the method can include receiving a voice command for selecting specified text within the electronic document and searching the body of text included in the surrounding region for a match to the specified text. Significantly, the search can be limited to the body of text in the surrounding region.
A method for searching for matching text in an electronic document can further include expanding the surrounding region to include an additional area of the user interface if a match to the specified text is not found in the body of text in the searching step. Notably, the additional area included by the expansion can include additional text. Accordingly, the additional text can be searched for a match to the specified text. Finally, as before, the search can be limited to the body of text and the additional text.
In a representative embodiment of the present invention, the expanding step can include expanding the surrounding region outwardly from the focus point by a fixed increment. Alternatively, the expanding step can include expanding the surrounding region by a fixed quantity of text adjacent to the body of text. Finally, the expanding step can include expanding the surrounding region outwardly from the focus point by a variable increment.
A method for discriminating between an instance of a voice command and an instance of speech dictation can include identifying a focus point in a user interface; defining a surrounding region about the focus point; identifying user interface objects in the surrounding region; further identifying among the identified user interface objects those user interface objects which are configured to accept speech dictated text and those user interface objects which are not configured to accept speech dictated text; computing a probability based upon those user interface objects which have been further identified as being configured to accept speech dictated text and those user interface objects which have been further identified as not being configured to accept speech dictated text; receiving speech input; and, biasing a determination of whether the speech input is a voice command or speech dictation based upon the computed probability. Additionally, the method can include identifying a focus point outside of the user interface; and, biasing a determination of whether the speech input is a voice command or speech dictation based upon a default probability.