Non-patent document (Sadaoki Furui, September 1992, Acoustics and Speech Processing, First Edition: Kindai Kagaku sha Co. p 84-85) discloses an apparatus including a loud speaker and a microphone, such as a television conference system, a hands-free communication device, or a car navigation system with a sound outputting and sound recognizing functions, to have a function of an echo canceller which removes an echo based on the sound output by the loud speaker from the sound received by the microphone.
However, the conventional method such as the one disclosed in Non-patent document 1 has a problem with performance under the environment where, for example, music is playing. Specifically, when the echo canceling process is performed while music is being output from a loud speaker, a problem occurs that there is a high chance of the music remaining as a residual echo because of the performance limit of the adaptive filter. If voice recognition is performed while the music remains as a residual echo, the zone with the residual echo may be determined as a voice zone, resulting in a recognition error. Moreover, if the residual echo is included before or after the voice output by a speaker, the speech may incorrectly be recognized as other words.
To solve these problems involving false recognition, it is necessary to reduce the power of the residual echo as quickly as possible in a state called single talk where no speaker is talking, and to suppress only the echo while keeping the voice of a speaker in a state called double talk where a speaker is talking.