The present invention deals with speech recognition systems. In particular, the present invention relates to a context free grammar engine for use in speech recognition systems.
In speech recognition systems, a computer system attempts to identify a sequence of words from a speech signal. One way to improve the accuracy of the recognition is to limit the recognition to a set of selected phrases. This is typically done by limiting valid recognition hypotheses to phrases that are found in a context-free grammar (CFG).
In the past, applications which invoked speech recognition engines communicated directly with the engines. Because the engines from each vendor interacted with applications directly, the behavior of that interaction was unpredictable and inconsistent. This made it virtually impossible to change recognition engines without inducing errors in the application. It is believed that, because of these difficulties, speech recognition technology has not quickly gained wide acceptance.
In an effort to make such technology more readily available, an interface between engines and applications was specified by a set of application programming interfaces (API's) referred to as the Microsoft Speech API version 4.0 (SAPI4). Though the set of API's in SAPI4 specified direct interaction between applications and engines, and although this was a significant step forward in making speech recognition and speech synthesis technology more widely available, some of these API's were cumbersome to use, required the application to be apartment threaded, and did not support all languages.
The process of making speech recognition more widely available has encountered other obstacles as well. For example, many of the interactions between the application programs and the engines can be complex. Such complexities include cross-process data marshalling, event notification, parameter validation, default configuration, and many others. Conventional operating systems provide essentially no assistance to either application vendors, or speech engine vendors, beyond basic access to audio devices. Therefore, application vendors and engine vendors have been required to write a great deal of code to interface with one another.
In one particular example, where one or more applications desires to use one or more grammars with a speech recognition engine, the speech recognition engine is required to keep track of individual grammar loads and to request additional grammars to be loaded for imported rules. Further, the speech recognition engine is often required to parse recognition results to provide the application with a desired parse tree structure. This consequently requires the speech recognition engine to perform a great many tasks other than simply recognizing speech from an audio input signal (or speech signal).