In speech recognition, a speech signal is converted into a text string and in some systems a set of semantic tags that correspond to the semantic meaning of portions of the text string. To improve recognition accuracy, it is common to limit the recognizer to finding text strings that have been defined in a grammar.
An example of such a grammar is a context free grammar (CFG). In context free grammars, possible text sequences are defined as rules, where one rule may reference another rule. For example, the rule for setting up a meeting could be defined as “set up a meeting on <ruleref name=“date”>”, where <ruleref name=“date”> is a reference to a date rule that defines text strings that can represent a date.
While a CFG provides a straightforward structure for a grammar, it is difficult to construct an efficient CFG that provides high recognition accuracy. One reason for this is that in order to match a rule, the user must speak at least one of the word sequences anticipated by the author of the rule. To overcome this limitation, grammars have been developed that convert a context free grammar into an N-gram language model. The N-gram language model is constructed by identifying sequences of N words and/or rule references in the CFG and constructing a probability of each N-gram. Backoff probabilities for bigrams and unigrams can then be determined. Such grammars are often referred to as unified grammars and have the advantage that the user does not need to provide speech that exactly matches the sentences anticipated by the developer. Even if the user's speech includes words in a different order or omits certain words, the unified grammar can still provide a match.
Unified grammars are difficult for application developers to design. They require a large amount of knowledge in order to construct them well. In particular, the application developer must learn the scripting tags of the grammar format, understand how their construction of the CFG affects the ability of a unified grammar to identify speech, how to compute the backoff weights and how the grammar will interact with existing library grammars. This extra required knowledge represents a barrier to implementing speech recognition in everyday applications.