1. Field of the Invention
The present invention relates to the field of speech recognition, and, more particularly, to automatic grammar tuning using statistical language model generation.
2. Description of the Related Art
Speech recognition systems often use one or more language models to improve speech recognition accuracy. Language models provide information concerning a likelihood that various words or phrases will be used in combination with each other. Two basic types of language models include statistical language models and grammar-based language models.
A statistical language model is a probabilistic description of the constraints on word order found in a given language. Most current statistical language models are based on the N-gram principle, where the probability of the current word is calculated on the basis of the identities of the immediately preceding (N-1) words. A statistical language model grammar is not manually written, but is trained from a set of examples that models expected speech, where the set of examples can be referred to as a speech corpus. One significant drawback to statistical language model grammars is that a size of a speech corpus for generating a statistical language model grammar can be very large. A reasonably sized speech corpus can, for example contain over twenty thousand utterances or can contain five thousand complete sentences. A cost incurred to obtain this speech corpus can be prohibitively high.
A grammar-based language model manually specifies a set of rules that are written in a grammar specification language, such as the NUANCE Grammar Specification Language (GSL), a Speech Recognition Grammar Specification (SRGS) complaint language, a JAVA Speech Grammar Format (JSGF) compliant language, and the like. Using the grammar specification language, a set of rules is constructed that together define what may be spoken.
Performance of grammar-based language models can be significantly improved by tuning the grammars, where grammar tuning is a process of improving speech recognition accuracy by modifying speech grammar based on an analysis of its performance. Grammar tuning is often performed during an iterative period of usability testing and application improvement. Grammar tuning often involves amending an existing grammar with commonly spoken phrases, removing highly confusable words, and adding additional ways that a speaker may pronounce a word. For example, cross-wording tuning can fix utterances that contain words which run together. Adding representative probabilities to confusion pairs can correct substitution errors.
Conventionally implemented grammar tuning typically involves manually tuning efforts, which can involve specialized skills. Manual tuning can be an extremely time consuming activity that can take longer than is practical for a development effort. Further, conventional grammar tuning requires access to a grammar source code which may not be available.