1. Field of the Invention
The present invention relates to a voice interactive system for transmitting information to a user by using a voice output or a combination of a voice output and another information transmission unit in accordance with the contents of a user's voice input. In particular, the present invention relates to a voice interactive system having a barge-in function of processing a user's interrupt voice input by suspending the transmission of information, in the case where there is an interrupt by a user voice in the course of transmission of information to the user by using a voice output or a combination of a voice output and another information transmission unit.
2. Description of the Related Art
With the rapid advancement of computer technology, the technique regarding the processing of a voice signal is also advancing rapidly. Along with this, in a voice interactive system such as a voice portal that is being rapidly spread through the Internet or the like, a user and a system perform a pseudo interaction, whereby information desired by a user is provided through a voice output such as a synthetic voice and the like. Furthermore, next-generation mobile telephones and mobile terminals (PDA, etc.) can deal with image information in addition to a voice signal. Therefore, the future advancement of a voice interactive system providing multimedia information containing a combination of a voice and an image is also expected.
Recently, a voice portal that is being spread on the market is likely to have a barge-in function allowing a user interrupt to utter a voice even in the course of voice guidance from a portal site (voice interactive system), for the purpose of enhancing usability. The barge-in function detects the commencement of a user's voice input when a user voice is input to a system via a communication line or the like, suspends the guidance or the like through a voice output, and urges a user to input a voice. Herein, when the input to the system is only a user voice, the commencement of a user's voice input can be detected exactly, for example, by monitoring the fluctuation of an input power, and the guidance and the like through a voice output can be suspended. Therefore, the barge-in function is operated normally.
However, actually, a user voice is input to the system under the condition of being superimposed with a line echo generated when the guidance and the like through a voice output from the system is reflected from a communication line system and returns, an acoustic echo generated when a guidance voice of the system by a hand-free telephone or the like enters a receiver from a transmitter, stationary or non-stationary environment noise from a user peripheral environment, noise of a communication line system, and the like. In order to solve such a problem, echo suppression processing and noise suppression processing based on acoustic processing are generally performed.
For example, JP 9(1997)-252268 A discloses a voice interactive system capable of eliminating an echo caused by the return of a voice by analyzing the spectrum of an input voice.
As described above, an echo is generally suppressed by an echo canceller using various methods. However, an echo cannot be suppressed completely depending upon a communication line system, and an echo may remain in some cases.
Furthermore, noise is generally suppressed by a noise canceller. However, stationary noise can be suppressed effectively, whereas non-stationary noise is difficult to be suppressed.
Furthermore, parameters are often adjusted in an echo canceller or a noise canceller so as to enhance a suppression effect. However, the adjustment of parameters may distort a user's voice input, resulting in a decrease in a voice recognition ratio.
In the case where the levels of a residual echo and non-stationary noise generated for the above reasons are high, a conventional voice interactive system erroneously determines a residual echo and non-stationary noise to be a user's voice input, by using a barge-in function. Therefore, the guidance through a voice output and the like are suspended, and a residual echo and non-stationary noise are erroneously recognized by voice recognition, which is one of the factors causing the malfunction of the voice interactive system.