1. Field of the Invention
The present invention relates to the field of telecommunications and, more particularly, to speech utterance detection within a voice server.
2. Description of the Related Art
Telephone systems can utilize voice servers to add a multitude of speech services to telephone calls. Speech services can include automatic speech recognition (ASR) services, synthetic speech generation services, transcription services, language and idiom translation services, and the like. To perform these functions, voice servers must implement some form of speech detection to detect when a telephone caller is providing speech input upon which program actions are to be taken. The detection of speech input is typically followed by an allocation of an ASR engine to convert the detected utterances into a form that the voice server can interpret.
Conventional componentized voice servers, such as the Websphere Application Server (WAS) from International Business Machines Corporation (IBM) of Armonk, N.Y., utilize internal software-based speech detection routines. Speech detection operations can be entirely dependant upon these routines. For example, as currently implemented, the voice server component of the WAS, which is a Websphere Voice Server (WVS), performs all speech detection through internal software-based speech detection routines and does not permit WVS to detect speech utterances through external means.
The conventional approach for detecting speech utterances in a voice server possesses numerous shortcomings. One such shortcoming relates to inefficient use of scarce resources. That is, software-based speech detection routines can be very processor and memory intensive and can consume vast quantities of expensive computing resources. This is especially true, when the detection routines are set for high sensitivity levels and adjusted to optimize speech detection accuracy. These processor intensive routines, however, can exceed the detection needs of many customers. For example, a voice server customer may require only modest voice detection capabilities.
Further, many telephone gateways, hubs, and other telephony equipment possess integrated hardware-based speech detection capabilities. Unlike software-based detection techniques, hardware-based techniques need not consume extensive scarce resources. Instead, hardware-based techniques can monitor signal energy levels within telephony channels and differentiate speech utterances from silence and/or noise based upon differences in the signal energy levels. Many conventional voice servers fail to take advantage of these external hardware-based speech detection devices. It would be highly advantageous, if a voice server having internal software speech detection capabilities was able to selectively utilize externally available speech detection mechanisms in place of and/or in conjunction with internal software-based speech detection mechanisms.