Speech is perhaps the oldest form of human communication and many scientists now believe that the ability to communicate through speech is inherently provided in the biology of the human brain. Thus, it has been a long-sought goal to allow users to communicate with computers using a Natural User Interface (NUI), such as speech. In fact, recently great strides have been made in obtaining this goal. For example, some computers now include speech recognition applications that allow a user to verbally input both commands for operating the computer and dictation to be converted into text. These applications typically operate by periodically recording sound samples taken through a microphone, analyzing the samples to recognize the phonemes being spoken by the user and identifying the words made up by the spoken phonemes.
While speech recognition is becoming more commonplace, there are still some disadvantages to using conventional speech recognition applications that tend to frustrate the experienced user and alienate the novice or inexperienced user. One such disadvantage involves the interaction between the speaker and the computer. For example, with human interaction, people tend to control their speech based upon the reaction that they perceive in a listener. As such, during a conversation, a listener may provide feedback by nodding or making vocal responses, such as “yes” or “uh-huh”, to indicate that he or she understands what is being said to them. Additionally, if the listener does not understand what is being said to them, the listener may take on a quizzical expression, lean forward, or give other vocal or non-vocal cues. In response to this feedback, the speaker will typically change the way he or she is speaking and in some cases, the speaker may speak more slowly, more loudly, pause more frequently, or ever repeat a statement, usually without the listener even realizing that the speaker is changing the way they are interacting with the listener. Thus, feedback during a conversation is a very important element that informs the speaker as to whether they are being understood or not understood by the listener. Unfortunately however, conventional voice recognition applications are not yet able to provide this type of “Natural User Interface (NUI)” feedback response to speech inputs/commands facilitated by the man-machine interface.
Currently, voice recognition applications have achieved an accuracy rate of approximately 90% to 98%. This means that when a user dictates into a document using a typical voice recognition application their speech will be accurately recognized by the voice recognition application approximately 90% to 98% of the time. As such, out of every one hundred (100) letters recorded by the voice recognition application, approximately two (2) to ten (10) letters will have to be corrected. Thus, because the accuracy rate of speech recognition can vary so much, it is essential that the voice recognition application provide feedback to the user.
One method to provide this feedback involves utilizing a Graphical User Interface (GUI) to provide feedback in the form of a dialog box during active dictation by the user. Referring to FIG. 1, a screen of a display device 100 is shown with an active Microphone User Interface (UI) dialog box 102 along with the UI window 104 for the application being dictated to, wherein in this case the user is entering a “path” into the address box 106 of the “Run” module 108 of a Windows® operating environment. As shown, as the user dictates to the voice recognition application, what the user vocalizes is displayed within the Microphone UI dialog box 102. If the voice recognition application does not recognize what the user vocalized, the voice recognition application may use the Microphone UI dialog box 102 to display a request to repeat the vocalized word or letter. The Microphone UI dialog box 102 therefore provides a vehicle for the voice recognition application to communicate important feedback to the user during the data/command entry phase.
However, one disadvantage with this method of providing feedback involves the novice or inexperienced user. Typically, the novice user of desktop speech recognition applications have many very basic requirements as they learn to use speech recognition as a new way to interact with their Personal Computer (PC). Common requirements include determining if the microphone is turned on so that the computer can “hear” the user dictating and if the computer did “hear” the user dictating, what did the computer “hear” the user say? Additionally, how can the user turn the microphone on and/or off? All this is further complicated by the fact that Speech Recognition applications can be used to control any application which can be run on the computer (e.g. Microsoft® Word, Adobe Acrobat®, Microsoft®, Wordpad and Microsoft® Excel). Unfortunately however, when a user has multiple active dialog windows open simultaneously, it is easy for one window to block other open windows and as such, there is no good “one place fits all location” where the dialog box may be placed.
To address this issue, existing speech recognition applications have placed their feedback dialog components on the extremities of the screen (as a bar along the top of the screen, over the title bar, or in the taskbar) and although this works acceptably well for experienced users, this approach does not work very well for the novice user. One reason for this is that if the feedback dialog component is located in the top left hand side of the display device and the user is working at the bottom right hand side of the display device, the user will miss a significant amount of feedback information being provided by the feedback dialog component. This is because the user will be focusing on the wrong part of the display device, i.e. where the feedback dialog component is not in focus, and the only way to get this information is to deliberately change where the eyes are focusing. Thus, the only way for a user to see feedback information is to dart their eyes back and forth from one end of the display device to the other. As a result, novice users can be surprised to find themselves in states where the computer is not listening (i.e. the microphone is turned off, or where their commands were not recognized, leading to frustration and a feeling of being out of control.
Although attempts were made to address this problem by placing the feedback dialog box in proximity to where the user is looking, other problems were introduced. Specifically, when the feedback dialog component was placed in proximity to where the user was looking, it was noticed that the feedback dialog box could be placed in the way of where the user wants to see or click, thereby forcing the user to move the feedback dialog box out of the way before proceeding. For example, referring to FIG. 2 showing a display device 200, consider the situation in which the user has the options dialog window 202 open and is focused on the “Startup Task Pane” checkbox 204. If the feedback dialog component 206, shown as the red outline, is placed in proximity to where the user is looking, i.e. the “Startup Task Pane” checkbox 204 as shown, the feedback dialog component 206 will likely obstruct adjacent controls such as the “Smart tag” checkbox, the “Animated text” checkbox, the “Windows in Taskbar” checkbox and the “Field codes” checkbox. As such, this type of “floating component” is undesirable because it tends to obstruct existing controls and may actually exacerbate an already existing problem.