The present invention relates to speech recognition. More particularly, the present invention relates to automatically creating a speech recognition grammar for alphanumeric concepts.
Speech recognition systems are increasingly being used by companies and organizations to reduce cost, improve customer service and/or automate tasks completely or in part. Such systems have been used on a wide variety of computing devices ranging from stand alone desktop machines, network devices and mobile handheld computing devices. Speech recognition provides a natural user interface for application developers. For instance, for computing devices such as handheld mobile devices, complete alpha-numeric keyboards are impractical without significantly increasing the size of the computing device. Speech recognition thus provides a convenient input methodology for small devices and also allows the user to access a computer remotely such as through a simple telephone.
With speech recognition being more widely accepted if not required, there is a need to create flexible, accurate, speech-enabled applications quickly and efficiently. Research directed to spoken language understanding models has achieved flexibility because such systems allow mixed-initiative dialogs between the system and the user. While such systems and research has achieved accuracy at modeling the commands that contain multiple phrasal semantic units (slots), for example, a “ShowFlight” command like “List the flights from Seattle to Boston on Tuesday that costs no more than $400” in the domain of Air Travel Information System, they seldom studied the acquisition of the phrasal model for the low level concepts like date, time, credit card number, flight number, etc. Instead, they resorted to grammar libraries and database entries (e.g., city names from an application database) for solutions.
Nevertheless, a majority of the spoken language systems deployed so far are system-initiative, directed dialog systems. In such systems, most of the grammar development efforts are devoted to the low level concepts. While the grammar libraries and database entries are viable solutions, they did not solve the problem completely. For instance, the grammar library developers cannot foresee all possible domain specific concepts and pre-build grammars for them. In addition, the orthographic form of the database entries are often not sufficient to serve as the speech recognition grammar. For example, a proper speech recognition grammar needs to model a variety of alternative spoken expressions for an alphanumeric string. Suppose an application needs to recognize parts numbers and that “ABB123” is one of the parts numbers. The speech enabled system should be able to recognize this part number even if it is spoken in different ways such as “A B B one two three” or “A double B one twenty three.”
Accordingly, it is well-recognized that grammar development for the alphanumeric concepts like parts number and driver license numbers is one of the most challenging tasks. One attempt has been to employ a simple grammar based on a single state finite state model. Such a model has a loop for each character (A-Z) and each digit (0-9). However, the model generally does not work well for reasons including that the grammar does not capture the specificity of the target sub-languages. Therefore, the perplexity of the model is much higher than it should be. For example, if it is known that the parts number always starts with letter “B”, the grammar should explicitly model the constraint so that recognition errors that confuse “E” with “D”, “E”, “G”, and “P” will never occur.
In addition, the simple grammar does not model the diversity of linguistic expressions for many types of strings. In the example above, both portion “ABB” and portion “123” of “ABB123” can be provided in different yet very common ways, many of which are not modeled by the simple grammar.
Furthermore, special characters like “-”, “*”, etc. often appear in the alphanumeric sequences like parts numbers. This would require that the general alphanumeric grammar be customized in such cases.
In view of the foregoing problems, developers are often forced to write their own grammar for specific alphanumeric concepts. The process is tedious and error-prone. Unlike the grammar library, the grammars authored by the less experienced developers are often not optimized, thus have poor performance when used by a decoder.
A system or method for generating an alphanumeric grammar that addresses one, some or all of the foregoing needs would thus be beneficial.