1. Field of the Invention
The invention relates generally to speech recognition and, more specifically, to memory allocation in speech recognition systems to facilitate the use of dynamically alterable grammars.
2. Description of the Related Art
Many different speech recognition products have become commercially available recently. These products range from powerful dictation software that runs on personal computers, to much simpler systems that can recognize only a few words or commands. Most of these products use well-known speech recognition techniques and algorithms in which a speech signal is first sampled, and certain features or characteristics of the sampled signal are measured.
The English language is usually modeled as consisting of about 40 different sounds called phonemes, or phones. After a speech signal has been sampled and measured, a decoder (such as a Viterbi decoder or a Stack decoder) is typically used to match the measurements with the most likely phonemes. A “dictionary” is then used to combine the phonemes into words.
The words included in a speech recognition system's dictionary may be derived from data structures called “grammars” and “subgrammars.” For example, a “days of the week” grammar might include the words Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. Each word in a grammar is in turn commonly represented as the sequence of phonemes corresponding to the word's dictionary pronunciation. For example, one pronunciation of the word “Monday” might be represented by the five phonemes /m/, /ah/, /n/, /d/, and /ey/. Each phoneme is in turn typically represented by a three state Hidden Markov Model (HMM).
The quality of speech recognition systems has improved dramatically over the past several years; however, these systems usually require a significant amount of computer memory and processing power. Although this may not be a problem where powerful personal computers are used for speech recognition, it does limit the capabilities of speech recognition systems used in portable devices, which are currently only able to recognize a few words or commands.
Speech recognition systems require so much memory in part because of the way that the various grammars and subgrammars—the words, phones, and states—are stored and searched during operation of the system. These systems typically compile, expand, flatten, and optimize all of the grammars used by the speech recognition system into a large, single level data structure that must be stored in memory before the speech recognition system can operate. The generation of a large, single level data structure before run-time may allow certain types of speech recognition systems (such as systems used for dictation) to operate more quickly; however, this technique prevents grammars and subgrammars from being added to a speech recognition system at run-time.
Accordingly, there remains a need in the art for speech recognition system that uses memory efficiently, and that allows grammars and subgrammars to be dynamically alterable, i.e., added or replaced while the system is operating.