1. Field of the Invention
The invention relates to an echo canceller and a speech processing apparatus using the same capable of solving problems such as echo and howling occurring when performing communication in hands-free communication systems such as a hands-free telephone system and a video conference system.
2. Description of the Related Art
In related arts, in the hands-free communication systems such as the video conference system, voice collected by a microphone of a far-end device is transmitted to a near-end device to be outputted from a speaker of the near-end device. The near-end device is also provided with a microphone, and voice of a near-end speaker is transmitted to the far-end device. Therefore, voices outputted from speakers at the far-end side and at the near-end side are respectively inputted into the microphones. When any processing is not performed, the voices are transmitted to correspondent devices again, a phenomenon of “echo” in which speech of oneself is heard from the speaker with a little delay like an echo is caused. When the echo (feedback component) becomes large, it is inputted to the microphone again and loops in the system to cause “howling”.
As an apparatus for preventing the echo and the howling as described above, an echo canceller is known. Generally, by using an adaptive filter, an impulse response of a feedback path (echo path) formed by an acoustical coupling of the speaker and the microphone is measured and the impulse response is convoluted with a received signal (reference signal) outputted from the speaker to generate an echo replica, then, the echo replica is subtracted from a voice signal collected by the microphone to remove the echo.
The adaptive filter is well known in related arts, including a processor having a variable coefficient and an adaptive algorithm determining the coefficient at any time, which estimates an echo component of the feedback path (feedback component of the received signal through the feedback path) by adaptively updating the variable filter coefficient by the algorism in which a square mean value of an output signal from a subtractor is minimized. Then, only the echo component included in a transmitting signal is cancelled out by subtracting the echo component estimated by the adaptive filter from the transmitting signal in the subtractor, which prevents components other than the echo collected by the microphone (voice uttered by a speaker with respect to the microphone or surrounding noise) from being damaged.
As one of adaptive algorithms, there is a Normalized LMS algorithm (hereinafter, referred to as “NLMS”). In the NLMS algorithm, the filter coefficient is updated so that a residual signal between a microphone input signal and an estimated echo signal becomes small. At this time, a constant (step size (correction width)) for controlling the size of a correction amount (that is, the speed of convergence) in the repetition at everytime is set. The optimum step size μ(k) of the adaptive filter updated in the NLMS algorithm is shown by the following formula.
                              μ          ⁡                      (            k            )                          =                              E            ⁡                          [                                                                                      Er                    ⁡                                          (                      k                      )                                                                                        2                            ]                                                          E              ⁡                              [                                                                                                S                      ⁡                                              (                        k                        )                                                                                                  2                                ]                                      +                          E              ⁡                              [                                                                                                Er                      ⁡                                              (                        k                        )                                                                                                  2                                ]                                                                        (        1        )            
Here, S(k) denotes an interference signal to be inputted to the microphone, Er(k) denotes a residual echo signal not completely removed by the adaptation processing. E[ ] means that a short-time mean is taken and “k” denotes a frequency.
However, it is difficult to actually apply the optimum step size μ(k) as it is. Because the residual echo signal Er(k) included in the residual signal and the interference signal S(k) are difficult to directly observe respective signals, therefore, it is difficult to separate and extract signals.
A method of estimating an amount (ratio) of the residual echo signal in the residual signal by using coherence between the input signal and the residual signal defined by a prescribed calculation formula is proposed (for example, refer to Akira Emura, Yoichi Hada, “adaptive algorithm for deleting stereo echo under noise environment”, Collected papers of lectures of the acoustical society of Japan, The acoustical society of Japan, March 2002, 1-Q-5, P645-646 (Non-Patent Document 1)).