Two common types of speech recognition systems are continuous and discrete. Continuous speech recognition systems detect and discern useful information from continuous speech patterns. In use, an operator may speak phrases and sentences without pausing and the continuous speech recognition system will determine the words being spoken. Continuous speech recognition systems are used, for example, in voice-input word processors that enable operators to dictate letters directly to the computer.
Discrete speech recognition systems are designed to detect individual words and phrases that are interrupted by intentional pauses, resulting in an absence of speech between the words and phrases. Discrete speech recognition systems are often used in "command and control" applications in which an operator speaks individual commands to initiate corresponding predefined control functions. In a typical use, the operator speaks a command, pauses while the system processes and responds to the command, and then speaks another command. The system detects each command and performs the associated function.
A discrete speech recognition system employs a complete list of recognized words or phrases, referred to as the "vocabulary." A subset of the vocabulary that the recognition system is attempting to detect at any one time is known as the "active grammar." In general, the smaller the active grammar, the more reliable the recognition because the system is only focusing on a few words or phrases. Conversely, the larger the active grammar, the less reliable the recognition because the system is attempting to discern a word or phrase from many words or phrases.
One conventional approach is to construct a large grammar that encompasses each command option. FIG. 1 shows how this conventional approach might be applied to control an automobile radio. In this example, suppose the system is designed to allow the user to control the radio and access his/her favorite radio stations using voice commands. Using a large active grammar, a default radio grammar 20 might include the radio control words "AM," "FM," "Seek," and "Scan" and all of the preset radio stations. A corresponding command function is associated with each grammar word, as represented in Table 1.
TABLE 1 Default Grammar Word/Phrase Command Function AM Sets the radio to AM band. FM Sets the radio to FM band. Seek Directs the radio to seek to a new station. Scan Directs the radio to scan for a new station. One Sets the radio to preset station 1. Two Sets the radio to preset station 2. Three Sets the radio to preset station 3. Four Sets the radio to preset station 4. Five Sets the radio to preset station 5. Six Sets the radio to preset station 6. Seven Sets the radio to preset station 7. Eight Sets the radio to preset station 8. Nine Sets the radio to preset station 9. Ten Sets the radio to preset station 10.
The speech recognition system actively tries to recognize one of these words when the operator speaks. When a grammar word is detected, the speech recognition system performs the appropriate function. Suppose the operator says the word "AM." The discrete speech recognition system detects the active word 22 and performs the corresponding function 24 to set the radio to the AM band.
As noted above, a drawback with presenting a large all-encompassing grammar is that there is a greater likelihood of false recognition by the speech system. For instance, the system may experience trouble distinguishing between the words "FM" and "Seven" when both are spoken rapidly and/or not clearly enunciated. Another problem is that the system may recognize extraneous sounds that aren't intended to be entered as commands. For instance, the system may pick up words from a radio or other background source and carry out actions not intended by the user.
To avoid the problems associated with large grammars, another conventional approach is to construct sets of smaller grammars and navigate between them so that only one grammar is active at one time. FIG. 2 shows an example involving an automobile radio, in which the system begins with a small default grammar and switches to a new grammar upon detection of one or more keywords. With this approach, a default radio grammar 30 might include only the radio control words--"AM," "FM," "Seek," "Scan," and "Preset." A corresponding command function is associated with each grammar word, as represented in Table 2.
TABLE 2 Default Grammar Word/Phrase Command Function AM Sets the radio to AM band. FM Sets the radio to FM band. Seek Directs the radio to seek to a new station. Scan Directs the radio to scan for a new station. Preset Keyword to bring up preset station grammar
Upon recognition of the keyword "preset," the speech recognition system changes to a new grammar 32 for detecting the preset station numbers. Table 3 lists the new preset station grammar.
TABLE 3 Preset Station Grammar Word/Phrase Command Function One Sets the radio to preset station 1. Two Sets the radio to preset station 2. Three Sets the radio to preset station 3. Four Sets the radio to preset station 4. Five Sets the radio to preset station 5. Six Sets the radio to preset station 6. Seven Sets the radio to preset station 7. Eight Sets the radio to preset station 8. Nine Sets the radio to preset station 9. Ten Sets the radio to preset station 10.
The speech recognition system actively tries to recognize one of these words from the preset station grammar. Suppose the operator says the word "One." The discrete speech recognition system detects the active word 34 and performs the corresponding function 36 to set the radio to the preset station 1.
A drawback with this system is that it forces the users to remember the structure and availability of the grammars. This is particularly difficult in situations where the grammars are new or changing. An example of this situation, is when the user is concentrating on another task and using speech to input commands because their attention, hands, and eyes are otherwise occupied. The user may call out a keyword in one grammar, causing the system to switch to a different grammar, and then subsequently be distracted in their primary task (e.g., driving in traffic) and forget which grammar is currently active. For instance, suppose the operator had called out "preset" to get the preset station grammar of Table 3 and was subsequently interrupted. The system is awaiting words/phrases from the preset station grammar of Table 3. Unfortunately, due to the interruption, the operator may have forgotten that the preset station grammar is active and may attempt to speak commands in the default grammar of Table 2, such as "seek" or "scan." Since these commands are not supported by the currently active grammar, the system will not recognize the commands. This is confusing and frustrating for the operator.
Accordingly, there is a need for improving user interaction with speech recognition systems to assist a user in navigating new or changing grammars.