While great progress in speech recognition accuracy has taken place over the last decade, building a robust speech recognition application is still usually expensive because of a relatively long development cycle required to get an application to an acceptable accuracy level. One of the difficulties in developing speech recognition applications is the development of grammars that recognize a user's input.
Consider a relatively simple example of developing a speech recognition system for purchasing movie tickets. A developer of such a system may use a prompt such as “Welcome to the movie line. How many tickets do you want to purchase?” and then build a simple digits context free grammar (CFG) including numbers 1 through 10, for example. However, when using this system, rather than simply uttering a number (saying “two,” for example), some users may respond by saying “I want to buy two tickets,” for example, which would not be covered by the grammar and thus lead to higher error rates or increased rejection. Such a problem can sometimes be ameliorated by a careful choice of words, for the prompt, which instruct the user to stay within the grammar (for example, “Please say a number between 1 and 10”). Another approach to addressing this problem is to build grammars with increased coverage. However, in general, it is relatively difficult to manually construct a CFG when there are numerous different ways of asking for the same item(s).
An alternative approach to achieving the same goal is to use semantic (or keyword) spotting in which models for garbage words, referred to as filler models (FMs), are used. Of the existing FMs, n-gram based FMs have been shown to offer superior accuracy. However, the existing n-gram based FMs require a custom FM trained from domain data.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.