The present invention relates, in general, to automatic speech recognition systems, and more particularly, to an apparatus, method and system for controlling caller input rates for automatic speech recognition systems utilized in interactive information and telecommunication systems.
Automatic speech recognition systems are being utilized increasingly for a wide variety of interactive telecommunication and other services. For example, automatic speech recognition systems are utilized to recognize verbal responses to audible prompts in interactive systems such as airline flight information systems, voice dialing telecommunication systems, and electronic mail retrieval (with speech synthesis).
Automatic speech recognition systems, however, have a predetermined and fixed number of input channels, typically 32 to 64 input channels. Current speech recognition systems allocate one such input channel per caller (or other user). As a consequence, once all input channels are in use, additional callers are typically are put on hold until an input channel becomes available. Some systems provide music or announcements to the holding caller, and others provide information to the caller concerning their place in a holding queue. Under conditions of greater congestion, consumers may be informed to call again at another time, and calls may also be dropped or lost altogether.
These holding or dropped call responses to overload conditions, unfortunately, typically create consumer dissatisfaction and irritation. Callers typically dislike and may even resent being put in a holding queue. Correspondingly, service providers would prefer to meet consumer service demands in a timely, user friendly, efficient and cost-effective manner.
As a consequence, a need remains to provide an apparatus, method and system for increased capacity for automatic speech recognition systems, without requiring a corresponding increase in the fixed number of caller input channels. Such an increased capacity should be user transparent, user friendly, and effectively imperceptible to consumers under congestion or overload conditions. Such an increased capacity should also be capable of a cost-effective implementation in existing automatic speech recognition systems, providing increased capacity for any given fixed number of caller input channels.
The various embodiments of the present invention provide for an increased capacity of an automatic speech recognition (ASR) system primarily by performing two types of functions, a concentrator function and a delay function. The concentrator function allows a greater number of caller input/output (I/O) channels to be in use, by switching only active caller input channels to the ASR input channels for speech recognition (i.e., not providing to or reserving ASR functions for channels which are active only in output (play) modes, in which a caller is listening to a message of the service provider). The delay functionality, for potential overload conditions, is implemented to increase the duration of the output (play) mode, thereby increasing the listening time of the callers, and correspondingly providing more time for the input ASR channels to be utilized to recognize speech input on other calls (and, presumably, to thereby handle a greater call volume). The delay functionality is preferably implemented using one or both of two types of delay, increased message duration for prompts played to the callers, and the insertion of additional periods of silence, at the beginning or during the various messages played to callers.
The preferred interactive communication system having caller input rate control for automatic speech recognition includes, first, a network interface having input channels and output channels; second, an output module for message output on the output channels; third, an automatic speech recognition module having ASR input channels; and a caller input rate control module coupled to the input channels, to the output module and to the ASR input channels. The caller input rate control module includes instructions to determine a usage level of the ASR input channels, and when the usage level of the ASR input channels is greater than a first predetermined threshold, to direct the output module to provide an associated delay mode for a message output on the output channels.
The caller input rate control module provides concentrator functionality by monitoring energy levels of the input channels and storing in a memory buffer all received input from the input channels for a preceding period of time, to form buffered information. When the monitored energy level of a given input channel is greater than a predetermined energy level, the caller input rate control module transmits the buffered information to an ASR input channel and connects that input channel to the ASR input channel for speech recognition.
The caller input rate control module determines the associated delay mode by selecting, individually or in combination, a message duration from a plurality of message durations, and a silent period duration from a plurality of silent period durations, for the message output on the output channel. The associated delay mode may be proportional to the usage level of the ASR input channels, or may correspond to a range or increment of usage levels of the ASR input channels.
Various advantages of the present invention include actually serving a caller by an interactive system, without placing the caller on hold and without loss of the call, while all ASR input channels are busy serving other callers. Second, the various embodiments of the present invention provide for increased ASR system capacity, and provide for more efficient utilization of existing ASR input channels, without requiring a corresponding increase in the given or fixed number of ASR input channels. In addition, by utilizing ASR input channels more efficiently, actual caller waiting time is decreased, and by continuing to serve callers through prompts or messages of longer duration, apparent waiting time is also decreased. Lastly, this increased capacity provided by the present invention is user transparent, user friendly, effectively imperceptible to consumers under congestion or overload conditions, and is also capable of a cost-effective implementation in existing automatic speech recognition systems.