1. Technical Field
This invention relates to the field of speech recognition and more particularly to enabling speech recognition grammars.
2. Description of the Related Art
To recognize the spoken word, a speech recognition system can process analog acoustical information into computer readable digital signals that can be recognized as core components of speech which can be further recognized as discrete words. Still, to accurately recognize the spoken word, a speech recognition system relies not only on acoustical information, but also on the context in which the word is spoken. More particularly, speech recognition grammars can indicate the context in which speech sounds are recognized.
To determine the context in which a word is spoken, speech recognition systems can include speech recognition grammars which can predict words which are to be spoken at any point in a spoken command phrase. Essentially, from a speech recognition grammar, a speech recognition system can identify the words which should appear next in a spoken phrase. For example, given the speech recognition grammar,
<root> =call <namelist>|display<itemlist>.<namelist>=Bill | John.<itemlist>=names | messages.if a speaker recites, “Call John”, once the speech recognition system determines that the word “call” has been spoken, the speech recognition system can conclude that the only possible words that can be spoken next in the command phrase are the words “Bill” and “John”. Hence, the use of speech recognition grammar can result in more accurate speech recognition since the list of possible words which can be spoken at any point in a spoken phrase is limited based upon the previously spoken words.
Notwithstanding, despite the assistance of a speech recognition grammar, the use of a speech recognition system in networked client device can pose significant problems. In particular, unlike performing speech recognition in a stand-alone desktop computer, networked client devices often can lack similar processing power. Whereas desktop computers can include high processing power CPUs and vast fixed storage, networked client devices, often in view of power consumption and conservation concerns, include low processing power CPUs and limited fixed storage. Thus, performing complex computer processes in a networked client device can be problematic at best. In the worst case, storing larger, more complex speech recognition grammars may not be possible in a networked client device.
Presently two methods are employed in performing speech recognition in a networked client device. First, speech recognition can be performed entirely within the confines of the networked client device. Still, processing complex speech recognition grammars in a networked client having low processing power, such as a handheld client, can prove problematic due to the processing constraints of the networked client. In particular, such networked clients cannot provide realtime feedback often required by speech recognition applications because of processing power limitations of the networked client.
In a second known method for performing speech recognition in a networked client device, speech recognition is performed entirely in a server communicatively linked to the networked client. Processing speech recognition grammars entirely in a server communicatively linked to the networked client can surmount the processing limitations posed by low processing powered networked clients. Still, processing speech recognition grammars entirely in a server can prove problematic inasmuch as the processing of the speech recognition grammar can be limited by available network resources.
Specifically, congested networks or those networks having constrained bandwidth can prevent realtime processing of speech audio in the server as can be required by some speech recognition applications. Notably, realtime processing of speech audio entirely in a server can prove problematic, even where the speech grammar used to process the speech audio, in itself, is not a complex speech recognition grammar. In this case, though the processing power of a server is not required, realtime speech recognition is inhibited by the limitations of the network.