1. Field of the Invention
The invention, in general, is directed to speech dialogue systems, and, in particular, to detecting barge-in in a speech dialogue system.
2. Related Art
Speech dialogue systems are used in different applications in order to allow a user to receive desired information or perform an action in a more efficient manner. The speech dialogue system may be provided as part of a telephone system. In such a system, a user may call a server in order to receive information, for example, flight information, via a speech dialogue with the server. Alternatively, the speech dialogue system may be implemented in a vehicular cabin where the user is enabled to control devices via speech. For example, a hands-free telephony system or a multimedia device in a car may be controlled with the help of a speech dialogue between the user and the system.
During the speech dialogue, a user is prompted by the speech dialogue system via speech prompts to input his or her wishes and any required input information. In most prior art speech dialogue systems, a user may utter his or her input or command only upon completion of a speech prompt output. Any speech activity detector and/or speech recognizer is activated only after the output of the speech prompt is finished. In order to recognize speech, a speech recognizer has to determine whether speech activity is present. To do this, a segmentation may be performed to determine the beginning and the end of a speech input.
Some speech dialogue systems allow a so-called “barge-in.” In other words, a user does not have to wait for the end of a speech prompt but may respond with a speech input during output of the speech prompt. In this case, the speech recognizer, particularly the speech activity detecting or segmentation unit, has to be active during the outputting of the speech prompt. Allowing barge-in generally shortens a user's speech dialogue with the speech dialogue system.
To avoid having the speech prompt output itself erroneously classified as a speech input during the outputting of a speech prompt, different methods have been proposed. U.S. Pat. No. 5,978,763 discloses voice activity detection using echo return loss to adapt a detection threshold. According to this method, the echo return loss is a measure of the attenuation, i.e., the difference (in decibels), between the outgoing and the reflected signal. A threshold is determined as the difference between the maximum possible power (on a telephone line) and the determined echo return loss.
U.S. Pat. No. 7,062,440 discloses monitoring text-to-speech output to effect control of barge-in. According to this disclosure, the barge-in control is arranged to permit barge-in at any time but only takes notice of barge-in during output by the speech system on the basis of a speech input being recognized in the input channel.
A method for barge-in acknowledgement is disclosed in U.S. Pat. No. 7,162,421. A prompt is attenuated upon detection of a speech input. The speech input is accepted and the prompt is terminated if the speech corresponds to an allowable response. U.S. Pat. No. 7,212,969 discloses dynamic generation of a voice interface structure and voice content based upon either or both user-specific controlling function and environmental information.
A further possibility is described in A. Ittycheriah et al., Detecting User Speech in Barge-in over Prompts Using Speaker Identification Methods, in ESCA, EUROSPEECH 99, IISN 10108-4074, pages 327-330. Here, speaker-independent statistical models are provided as Vector Quantization Classifiers for the input signal after echo cancellation, and standard algorithms are applied for speaker verification. The task is to separate speech of the user and background noises under the condition of robust suppression of the prompt signal.
Accordingly, there is a need to provide a method and an apparatus for detecting barge-in in a speech dialogue system more accurately and more reliably.