The Present invention deals with services for enabling speech recognition and speech synthesis technology. In particular, the present invention relates to an event notification system in a middleware layer which lies between speech related applications and speech related engines.
Speech synthesis engines typically include a decoder which receives textual information and converts it to audio information which can be synthesized into speech on an audio device. Speech recognition engines typically include a decoder which receives audio information in the form of a speech signal and identifies a sequence of words from the speech signal.
In the past, applications which invoked these engines communicated directly with the engines. Because the engines from each vendor interacted with applications directly, the behavior of that interaction was unpredictable and inconsistent. This made it virtually impossible to change synthesis or recognition engines without inducing errors in the application. It is believed that, because of these difficulties, speech recognition technology and speech synthesis technology have not quickly gained wide acceptance.
In an effort to make such technology more readily available, an interface between engines and applications was specified by a set of application programming interfaces (API's) referred to as the Microsoft Speech API version 4.0 (SAPI4). Though the set of API's in SAPI4 specified direct interaction between applications and engines, and although this was a significant step forward in making speech recognition and speech synthesis technology more widely available, some of these API's were cumbersome to use, required the application to be apartment threaded, and did not support all languages.
The process of making speech recognition and speech synthesis more widely available has encountered other obstacles as well. For example, the vendors of applications and engines have been required to write an enormous amount of code simply to implement the different interfaces for the different applications and engines that can be used together. In such systems, event notification is very cumbersome. The engines are required to notify the applications directly of events, such as word boundaries, visemes, bookmarks, etc. This has required engines to know exactly how the application wished to be notified of such events. Similarly, output devices (such as audio devices in a text-to-speech system) have also been required to know when events are occurring and how an application wishes to be notified of the events. Since applications traditionally can be notified of events in one of a number of different ways, this required specific code to be written to interface to specific applications.