Speech recognition based telephony systems are used by businesses to answer phone calls with a system that engages users in natural language dialog. These systems use interactive voice response (IVR) telephony applications for a spoken language interface with a telephony system. IVR applications enable users to interrupt the system output at any time, for example, if the output is based on an erroneous understanding of a user's input or if it contains superfluous information that a user does not want to hear. Barge-in allows a user to interrupt a prompt being played using voice input. Enabling barge-in may significantly enhance the user's experience by allowing the user to interrupt the system prompt, whenever desired, in order to save time. Without barge-in, a user may react only when the system prompt completes, otherwise the user's input is ignored by the system. This may be very inconvenient to the user, particularly when the prompt is long and the user already knows the prompt message.
In today's touch tone based IVR systems, barge-in is widely adopted. However, for speech recognition based IVR systems, barge-in poses to be a much greater challenge due to background noise and echo from a prompt that may be transmitted to a voice recognition system.
One method of barge-in, referred to as key barge-in, is to stop playing a prompt and be ready to process a user's speech after the user presses a special key, such as the “#” or “*” key. One problem with such a method is that the user must be informed of how to use it. As such, another prompt may need to be added to the system, thereby undesirably increasing the amount of user interaction time with the system.
Another method of barge-in, referred to as voice barge-in, enables a user to speak directly to the system to interrupt the prompt. FIG. 1 illustrates how barge-in occurs during prompt play in a voice barge-in system. Such a method uses speech detection to detect a user's speech while the prompt is playing. Once the user' speech is detected in the incoming data, the system stops playing and immediately begins a record phase in which the incoming data is made available to a speech recognition engine. The speech recognition engine processes the user's speech.
Although, such a method may provide a better solution than key barge-in, the voice barge-in function of current IVR systems has several problems. One problem with current IVR systems is that the computer-telephone cards used in these systems may not support full-duplex data transfer. Another problem with current IVR systems is that they may not be able to detect speech robustly from background noise, non-speech sounds, irrelevant speech and/or prompt echo. For example, the prompt echo that resides in these systems may significantly degrade speech quality. Using traditional adaptive filtering methods to remove near-end prompt echo may significantly degrade the performance of automatic speech recognition engines used in these systems.