Speech is perhaps the oldest form of human communication, and many scientists now believe that the ability to communicate through speech is inherently provided in the biology of the human brain. It thus has been a long-sought goal to allow users to communicate with computers using speech, and great strides recently have been made in obtaining this goal. For example, some computers now include speech recognition applications that allow users to speak aloud both commands for operating the computer and dictation to be converted into text. These applications periodically record sound samples taken through a microphone, analyze the samples to recognize the phonemes being spoken by the user, and identify the words made up by the spoken phonemes.
While speech recognition is becoming commonplace, there are still some disadvantages to using conventional speech recognition applications. With human interaction, people control their speech based upon the reaction that they perceive in a listener. For example, a listener may nod or make vocal responses, such as “yes” or “uh-huh,” to indicate that he or she understands what is being said to them. On the other hand, a listener may take on a quizzical expression, lean forward, or give other vocal or non-vocal cues if the listener does not understand what is being said. Based upon these responses, a speaker will speak more slowly, more loudly, pause more frequently, or repeat a statement, usually without the listener even realizing that he or she is changing they way they are speaking.
Unfortunately, conventional voice recognition applications do not provide these responses to speech. Some voice recognition applications may display various indicators to show a user when the application is recording. For example, some voice recognition applications may display a “microphone on” indicator when the application is recording sound samples, and a “microphone off” indicator when the application has stopped recording sound samples. Some voice recognition software applications may also employ a volume indicator, to graphically show a user the level at which the application is recording sound samples. Further, some voice recognition applications may even provide an indicator after a phrase of speech has been recognized, to inform the user as to whether or not the recognition process was successful or unsuccessful. Thus, these voice recognition applications may display the phrase “please repeat that” if a phrase has not been properly recognized, or display the recognized phrase when it has been recognized. None of these indicators, however, gives the user any sign as to whether or not the voice recognition application is recognizing a phrase while the user is still speaking that phrase.
This is a particularly significant disadvantage for conventional voice recognition applications, as they will generally experience a substantial lag time between the user speaking a phrase and when the application recognizes that phrase. In order to recognize spoken commands, for example, a voice recognition application will usually employ a grammar library. This grammar library contains the sequence of words (which are themselves expressed as a sequence of phonemes) that make up each command that can be given through the voice recognition application. Before the voice recognition application will begin the recognition process for a phrase, it will first match a recorded sound with the initial phoneme of a command in the grammar library. Only after the voice recognition application determines that the user has begun to speak the first phoneme of an actual command will it then start the recognition process for subsequent sounds spoken by the user.
As the recognition process continues, the voice recognition application will typically catch up in recognizing the words of the phrase being spoken by the user. For example, with voice recognition applications that employ a cache memory, the recognition process will become quicker as more speech data is loaded into the cache. Also, subsequent sounds in a command phrase may be more quickly recognized if the phrase has few word alternatives. Each time the user completes a phrase and begins speaking a new phrase, however, there is a new delay in the recognition process.
These delays in the speech recognition process create a significant problem with most users. As the speech recognition application delays in recognizing input sounds, the typical user will become uncertain as to whether the speech recognition application is working. In response, the typical user will often begin to speak more slowly, more loudly, or both. This detracts from the accuracy of the recognition process, which is calibrated to recognize speech at conversational volumes and at normal speeds. Even worse, the user may repeat a phrase, causing the phrase to be recognized twice. As the accuracy of the recognition process decreases, the typical user will speak still more slowly, more loudly, or become more repetitive, making the accuracy even worse. This cycle will continue until the user becomes too frustrated to continue employing the voice recognition application.