In the last decades, the user-friendliness of voice-controlled systems has been continuously improved. The user shall be enabled to handle complex systems intuitively using speech. The design of the dialogue between men and machine should be adapted to the user to ensure easy handling. Particularly with applications in the field of automobiles, like a voice-controlled hands-free kit, handling has to be easy and must not distract the driver from observing the traffic.
Most systems use an inflexible schema of dialogue, where men and machine are alternating and no temporal overlapping is possible. The system is activated by pressing a handle and the user can make his/her speech input. While the system is playing a message (the “prompt”) to the user via a loudspeaker or is inviting his/her input, no interruption (barge-in) by the user is possible.
Such a schema slows down the dialogue, particularly for experienced users, which does not do any good to user-friendliness and, in this way, to the acceptance of voice control. In particular, experienced users commonly wish to be able to go through frequently repeated steps more quickly by being enabled to interrupt the prompt.
Therefore, a speech dialogue system should offer the possibility of a barge-in to the user, i.e. of interrupting the speech prompt by issuing a speech command. A block diagram of a conventional barge-in system is shown in FIG. 4. To communicate with the user, the system issues a prompt signal which, in general, expects some kind of verbal response from the user.
The prompt signal is provided by the prompt unit 460 and is emitted via a loudspeaker 400 as sound into the environment of the system. A microphone 410 is also part of the system, which provides an electrical input signal into the system corresponding to the sum of all the sound which can be received from the environment of the microphone.
At first, the input signal usually passes a unit for noise reduction 420 which removes noise from the input signal according to some standard procedure. In the resulting signal with reduced noise, the segmenting unit 430 identifies the presence of speech in the signal and determines logical units of speech. The segmentation unit 430 also gets information from the prompt unit 460 about which speech prompt is issued to allow taking into account parts of the prompt which are fed back from the loudspeaker 400 to the microphone 410 and thus, are present in the input signal as well. The segmentation unit 430 signals information concerning the units in the speech to the speech recognition unit 440, which then tries to understand the meaning of the detected units. If a command of the user has been recognized in the input signal by the speech recognition unit 440, it is forwarded to the dialog manager 450 which then decides on further action of the system. This may include issuing another prompt, which may be accomplished by the dialog manager 450 by triggering the prompt unit 460.
However, as the prompt signal and the user's speech signal, which may be the signal wanted by the system, are present simultaneously in the environment of the microphone at least at the beginning of the user' speech, a speech dialogue system has to differentiate between the wanted speech signal issued by the user and the prompt signal. In addition, noise may be received by the microphone which is also interfering with the speech command.
Most barge-in enabled voice systems are based on an evaluation of the received microphone signals and the issued prompt signal which is available to the system. However, no previous knowledge with respect to the differentiation, in barge-in events, between the user's speech and noises in the environment of the microphone is applied with the exception of selecting threshold values. Hence, current systems do not have the possibility to differentiate clearly between utterances of the current speaker and the fed-back prompt as well as noise in the environment of the microphone.