Recently, cases of using voice recognition as a means for operating apparatuses have been increasing due to the improvement in the recognition performance. In the field of automobiles in particular, operations with voice recognition without using hands while driving has drawn an attention in view of safety.
In a car navigation system, for example, various kinds of information such as map information, various kinds of accompanying information thereof, and entertainment information are collected, and operations of apparatuses based on such information are executed by voice recognition. That is, the car navigation system controls the related apparatuses while conversing with the user including the driver. In such case, the navigation system takes in charge of the main function of controlling communications with the users.
In this case, the navigation system is required to accurately recognize the words uttered from the user in order to converse with the user correctly and give control instructions without mistakes.
For that, the navigation system is provided with an echo canceller device which, when a guidance speech uttered from the system itself is returned as an echo signal, eliminates it from voice signals of the user (see Patent Literature 1, for example).
FIG. 4 is a block diagram showing an echo canceller device of Related Technique 1. Hereinafter, explanations will be provided based on that drawing.
An echo canceller device 80 of Related Technique 1 is mounted into a car navigation system 90. The navigation system 90 includes a speaker 91, a microphone 92, a voice recognition unit 93, a main control unit 94, a guidance speech generation unit 95, and the like.
The echo canceller device 80 eliminates an echo signal y′(k) that is generated when the voice reproduced from the speaker 91 is captured by the microphone 92. Note here that the signal reproduced from the speaker 91 is defined as an input signal x(k), the signal including the echo signal y′(k) is defined as a objective signal d(k), and “k” is a variable expressing discrete time.
In this case, the echo canceller device 80 includes: a filter unit 81 that generates an output signal y(k) through filtering the input signal x(k) with a filter coefficient h(k); a subtraction unit 82 that inputs the objective signal d(k) and the output signal y(k) and outputs an error signal e(k) that is the difference therebetween; and an adaptation unit 83 that modifies the filter coefficient h(k) so as to reduce the error signal e(k) based on an adaptation algorithm.
The voice uttered by the user 100 is captured by the microphone 92 and transformed into a transmission signal Sin, which is A/D-converted by an A/D converter, not shown, transformed into a transmission signal Sout via the echo canceller device 80, and outputted to the voice recognition unit 93. The voice recognition unit 93 decrypts the transmission signal Sout with the voice recognition algorithm, and conveys the decrypted information to the main control unit 94. The main control unit 94 determines a proper guidance speech based on the decrypted information, and conveys it to the guidance speech generation unit 95. The guidance speech generation unit 95 performs voice synthesis of the guidance speech, and outputs it as a reception signal Rin. The reception signal Rin is D/A-converted by a D/A converter, not shown, to be a reception signal Rout, and reproduced from the speaker 91.
As shown in the drawing, the echo canceller device 80 suppresses the sound echo by the signal processing using an adaptation filter. The principle of the echo canceller will be described. The relation of the input signal x(k) reproduced by the speaker 91 and the echo signal y′(k) received by the microphone 92 can be expressed as y′(k)=x(k)*h′(k) (“*” shows convolution operation) by using impulse response h′(k) of the inside of a car. Thus, the echo canceller device 80 acquires a filter coefficient h(k) that is an estimated value of the impulse response h′(k), generates the output signal y(k) that is an estimated echo signal based thereupon, and subtracts it from the objective signal d(k) captured by the microphone 92 to prevent the sound echo. There is chronological fluctuation in the impulse response h′(k) of the inside of the car due to movement of the person, open/close of the doors and windows, or the like, so that the adaptation unit 83 is used for the estimation thereof. The adaptation unit 83 successively modifies the filter coefficient h(k) so as to minimize the power of the error signal e(k).
Note that in addition to the echo signal y′(k), the objective signal d(k) includes a speaker signal s(k) from the user 100, a noise n(k) such as the surrounding noise, etc. Further, the echo canceller device 80 also includes a delay adjustment unit 85 that adjusts the time difference between the input signal x(k) and the objective signal d(k), a noise suppression unit 86 that suppresses the noise n(k) and the like included in the error signal e(k), and the like.
Patent Literature 1: JP No. 5373473 B
Patent Literature 2: Japanese Unexamined Patent Publication 2002-204175
However, there are following issues such as (1), (2), and (3) with the echo canceller device 80 of Related Technique 1.
(1) The adaptation unit 83 requires a specific learning time for the determination of the optimum filter coefficient h(k). Thus, the filter coefficient h(k) is not converged yet immediately after a guidance speech is uttered from the speaker 91, so that the echo signal y′(k) cannot be eliminated completely and a residual echo may be generated. For example, such phenomenon tends to occur in a case of a first guidance speech after the echo canceller device 80 starts an operation, in a case where the impulse response h′(k) changes greatly between an end of a guidance speech and a guidance speech is started again, etc.
(2) The larger the noise n(k) becomes, the more difficult it becomes to acquire the filter coefficient h(k) that is the estimated value of the impulse response h′(k). Especially, when the noise n(k) become larger than the echo signal y′(k) and the echo signal y′(k) is buried in the noise n(k), it is almost impossible to estimate the impulse response h′(k) that is a transfer function with which the echo signal y′(k) reaches the microphone 92.
(3) When the user 100 conducts a mute operation or turns down volume of the speaker greatly, the reception signal Rin, even when outputted from the guidance speech generation unit 95, is not reproduced from the speaker 91 at all or reproduced hardly. This means that there is a large change in the impulse response h′(k), so that it requires a large modification amount in the filter coefficient h(k) and it takes time for the convergence thereof. Therefore, the echo cancel performance is greatly deteriorated during that time.
It is therefore an object of the present invention to provide an echo canceller device that can exhibit the stable echo cancel performance even in cases (1) where immediately after a voice is uttered from a speaker, (2) where the noise is large, and (3) where a reception signal is outputted but it is not reproduced from the speaker at all or reproduced hardly.