Two common types of speech recognition systems are continuous and discrete. Continuous speech recognition systems detect and discern useful information from continuous speech patterns. In use, an operator may speak phrases and sentences without pausing and the continuous speech recognition system will determine the words being spoken. Continuous speech recognition systems are used, for example, in voice-input word processors that enable operators to dictate letters directly to the computer.
In contrast, discrete speech recognition systems are designed to detect individual words and phrases that are interrupted by intentional pauses, resulting in an absence of speech between the words and phrases. Discrete speech recognition systems are often used in "command and control" applications in which an operator speaks individual commands to initiate corresponding predefined control functions. In a typical use, the operator speaks a command, pauses while the system processes and responds to the command, and then speaks another command. The system detects each command and performs the associated function.
This invention is directed to the discrete class of speech recognition systems.
A discrete speech recognition system employs a complete list of recognized words or phrases, referred to as the "vocabulary." A subset of the vocabulary that the recognition system is attempting to detect at any one time is known as the "grammar." In general, the smaller the active grammar, the more reliable the recognition because the system is only focusing on a few words or phrases. Conversely, the larger the active grammar, the less reliable the recognition because the system is attempting to discern a word or phrase from many words or phrases.
Accordingly, one design consideration for discrete speech recognition systems is to devise grammars that present useful command options, while being reliably detectable.
One conventional approach is to construct a large grammar that encompasses each command option. FIG. 1 shows how this conventional approach might be applied to control an automobile radio. In this example, suppose the system is designed to allow the user to control the radio and access his/her favorite radio stations using voice commands. Using a large-size active grammar, a default radio grammar 20 might include the radio control words-"AM", "FM", "Seek", and "Scan"-and all of the preset radio stations. A corresponding command function is associated with each grammar word, as represented in Table 1.
TABLE 1 Default Grammar Word/Phrase Command Function AM Sets the radio to AM band. FM Sets the radio to FM band. Seek Directs the radio to seek to a new station. Scan Directs the radio to scan for a new station. One Sets the radio to preset station 1. Two Sets the radio to preset station 2. Three Sets the radio to preset station 3. Four Sets the radio to preset station 4. Five Sets the radio to preset station 5. Six Sets the radio to preset station 6. Seven Sets the radio to preset station 7. Eight Sets the radio to preset station 8. Nine Sets the radio to preset station 9. Ten Sets the radio to preset station 10.
The speech recognition system actively tries to recognize one of these words when the operator speaks. When a grammar word is detected, the speech recognition system performs the appropriate function. Suppose the operator says is the word "AM". The discrete speech recognition system detects the active word 22 and performs the corresponding function 24 to set the radio to the AM band.
As noted above, a drawback with presenting a large all-encompassing grammar is that there is a greater likelihood of false recognition by the speech system. For instance, the system may experience trouble distinguishing between the words "FM" and "Seven" when both are spoken rapidly and/or not clearly enunciated.
Another conventional approach is to construct a small default grammar and to switch to a new grammar upon detection of one or more keywords. FIG. 2 shows how this conventional approach might be applied to control an automobile radio. With this approach, a default radio grammar 30 might include only the radio control words-"AM", "FM", "Seek", "Scan", and "Preset". A corresponding command function is associated with each grammar word, as represented in Table 2.
TABLE 2 Default Grammar Word/Phrase Command Function AM Sets the radio to AM band. FM Sets the radio to FM band. Seek Directs the radio to seek to a new station. Scan Directs the radio to scan for a new station. Preset Keyword to bring up preset station grammar
Upon recognition of the keyword "preset", the speech recognition system changes to a new grammar 32 for detecting the preset station numbers. Table 3 lists the new preset station grammar.
TABLE 3 Preset Station Grammar Word/Phrase Command Function One Sets the radio to preset station 1. Two Sets the radio to preset station 2. Three Sets the radio to preset station 3. Four Sets the radio to preset station 4. Five Sets the radio to preset station 5. Six Sets the radio to preset station 6. Seven Sets the radio to preset station 7. Eight Sets the radio to preset station 8. Nine Sets the radio to preset station 9. Ten Sets the radio to preset station 10.
The speech recognition system actively tries to recognize one of these words from the preset station grammar. Suppose the operator says the word "One". The discrete speech recognition system detects the active word 34 and performs the corresponding function 36 to set the radio to the preset station 1.
A drawback with this system is navigation of the grammars. An operator may call out a keyword in one grammar, causing the system to switch to a different grammar, and then subsequently be interrupted (e.g., driving in traffic) and forget which grammar is currently active upon returning his/her attention to the radio. For instance, suppose the operator had called out "preset" to get the preset station grammar of Table 3 and then subsequently became interrupted. The operator may then wish to seek or scan, but the system will not recognize these commands because the active grammar is currently looking for a preset station number.
Accordingly, there is a need to improve techniques for presenting grammars in discrete speech recognition systems for such applications as operating a vehicle radio.