The present invention generally pertains to speech recognition applications and systems. More specifically, the present invention pertains to methods and apparatus for automatically generating grammars for use by speech recognition applications.
Speech recognition applications often need to deal with big lists of proper names, symbols, numbers, ids, or other items. As an example, speech recognition is being increasingly used to recognize names spoken by a user or caller. For instance, voice dialing and other systems, a caller or user is typically asked to speak the name of the person who is to be contacted, or identified for some purpose. The system then uses a speech recognition engine to recognize the spoken name from a large list of names, often in combination with prompting for the caller or user to navigate through any name collisions or other difficulties in the identification process. Speech recognition of spoken names is also used for many purposes other than voice dialing systems.
One of the biggest challenges to using speech recognition to recognize names or other items relates to the process of building context free grammars (CFGs) to be used by the speech recognition engine. This is particularly true if the items to be recognized are from a large data list. In some speech recognition systems or applications, the number of items on the data list increases frequently, sometimes even daily, by significant numbers. In certain applications, it is possible for the number of items on the data list to increase by tens of thousands of items every day. Creating or updating CFGs to deal with these large and sometimes fast growing data lists can be very challenging, time consuming and cumbersome. In short, a challenge faced by many in speech recognition applications is to correctly and timely generate efficient grammars from those big lists.
A number of factors which affect speech recognition engine performance need to be considered when generating the CFG to be used by the speech recognition engine during the speech recognition process. To increase the ability of a speech recognition engine to accurately identify a spoken name or item, prefixing of the CFG is useful. For example, with a prefixed CFG, instead of the speech recognition engine having to process the competing common phrase “David”, the grammar recognizes “David” as a shared speech unit. The grammar then branches to possible next speech units “Ollason” and “Smith” for continued speech recognition. In other words, prefixing of a CFG allows the speech recognition engine to reduce the resource consumption, which typically improves accuracy of the recognition process. Other factors which must be considered when generating a CFG include weighting of branches of the tree structure represented in the CFG, dealing with name collisions (names sharing identical spellings or pronunciations), optimizing the size (storage and processing requirements) of the CFG, etc.
Due to the size of the task of creating or updating grammars for large lists, it is important to do so as efficiently as possible. However, accuracy is also very important. Any techniques for speeding up the grammar generation or updating process which result in a lower quality grammar will render the speech recognition system, using the CFG, less accurate. This in turn will increase the time required for users of the system to achieve a desired result, for example of being connected to a particular individual in a voice-dialing system. Many users will find the decreased accuracy and increased time required to be unacceptable.
The present invention provides solutions to one or more of the above-described problems and/or provides other advantages over the prior art.