1. Field of the Invention
The present invention relates to acoustic echo cancelers which cancel echoes occurring in communications.
The present application claims priority on Japanese Patent Application No. 2008-89733, the content of which is incorporated herein by reference.
2. Description of the Related Art
Audio communication technologies allowing remote talkers (e.g. near-end talkers and far-end talkers located in remote places respectively) to perform conversations by use of communication devices having microphones and speakers have been widely used in telecommunication systems and television conference systems. In communications using communication devices, audio signals reproduced by speakers are partially input into microphones and mistakenly transmitted to counterpart communication devices, by which counterpart talkers may recognize their voices as echoes. Such a phenomenon causes discomfort for talkers, and a significantly large echo causes howling so as to cause auditory difficulty in communications. Conventionally, echo cancelers (or acoustic echo canceling devices) using adaptive filters have been developed to cancel echoes.
FIG. 5 is a block diagram showing the constitution of an echo canceler 20. A near-end talker having a communication device including the echo canceler 20 receives a speech signal x(t) transmitted from a counterpart communication device of a far-end talker (not shown). The speech signal x(t) is directly supplied to a speaker 26, thus reproducing the far-end talker's speech. The speech signal x(t) is also supplied to an adaptive filer 21 in the echo canceler 20. The sound of the speaker 26 propagates through an echo path EP characterized by an impulse response h(t), in which it is converted into an echo y(t) and then input to a microphone 27, wherein the echo path EP and the impulse response h(t) may vary in a lapse of time. The microphone 27 inputs a speech v(t) of the near-end talker in addition to the echo y(t), thus producing a mixed signal s(t) (where s(t)=v(t)+y(t)).
The adaptive filter 21 uses an estimated impulse response h′(t) for the echo path EP lying between the speaker 26 and the microphone 27 so as to set a filter coefficient, thus dynamically simulating an echo replica y′(t) simulating the echo y(t) based on the input speech signal x(t). The estimated impulse response h′(t) is adaptively produced so as to minimize an echo-canceled signal e(t) output from a subtracter 23. The subtracter 23 subtracts the echo replica y′(t) from the mixed signal s(t) of the microphone 27. Thus, it is possible to produce the echo-canceled signal e(t) based on the sound received by the microphone 27.
It is possible to use various algorithms such as NLMS (Normalized Least Mean Square), RLS (Recursive Least Square), and APA (Affine Projection Algorithm). Using any one of the algorithms, the filter coefficient of the adaptive filter 21 may be erroneously adjusted and updated in response to the speech v(t) of the near-end talker, thus making it very difficult to perform echo cancellation appropriately. In order to solve such a drawback, a double-talk detector 22 (in which the term “double-talk” refers to simultaneous occurrence of the near-end talker's speech and the far-end talker's speech) is used to detect the speech v(t) of the near-end talker so as to stop the adaptive filter 21 updating the filter coefficient in response to the speech v(t) of the near-end talker. Only in the non-speech period in which the microphone 27 does not receive the speech v(t) of the near-end talker, the adaptive filter 21 is activated so as to update the filter coefficient based on the estimated impulse response h′(t), thus achieving high-precision echo cancellation. In the speech-reception period in which the microphone 27 receives the speech v(t) of the near-end talker, the adaptive filter 21 stops updating the filter coefficient, thus performing echo cancellation appropriately.
It is possible to adopt various detection methods for use in the double-talk detector 22 for detecting the speech v(t) of the near-end talker, wherein Non-Patent Documents 1 to 3 teach conventionally-known double-talk detection methods.                Non-Patent Document 1: “The fast normalized cross-correlation double talk detector” written by Tomas Gansler et al. for SIGNAL PROCESSING, Vol. 86, pp. 1,124 to 1,139, June, 2006        Non-Patent Document 2: “Double-Talk Detection Method with Detecting Echo Path Fluctuation” written by Kensaku Fujii et al for The Institute of Electronics, Information and Communication Engineers, Vol. J78-A, No. 3, pp. 314-322, March, 1995        Non-Patent Document 3: “A New Class of Doubletalk Detectors Based on Cross-Correlation” in IEEE Transactions on Speech and Audio Processing, Vol. 8, pp. 168-172, March, 2000        
In a first detection method (disclosed in Non-Patent Document 1), the ratio of the mixed signal s(t) (output from the microphone 27) to the speech signal x(t) of the far-end talker is calculated and compared to a prescribed threshold value, wherein it is determined that the microphone actually receives the speech v(t) of the near-end talker when the ratio is higher than the threshold value, while it is determined that the microphone 27 does not receive the speech v(t) of the near-end talker when the ratio is lower than the threshold value.
A second detection method is realized based on the empirical reality in which a residual echo increases in power due to the occurrence of the speech v(t) of the near-end talker, while it decreases in power due to high-precision echo cancellation without the occurrence of the speech v(t) of the near-end talker. Through monitoring the power of a residual echo, it is determined that the speech v(t) of the near-end talker occurs in response to an increase of the residual echo. Since the power of a residual echo is likely increased due to variations of an echo path as well, it is necessary to additionally detect variations of the echo path (as disclosed in Non-Patent Document 2).
Other detection methods have been developed and disclosed in various documents such as Non-Patent Document 3, wherein speech detection is implemented using the coherence of the echo y(t), correlations (or cross-correlations) of speeches, and the like.
In order to effectively cancel echoes, the echo canceler 20 of FIG. 5 further includes a loss insertion unit 24 and a gain controller 25. In actual circumstances, an echo may still remain in the echo-canceled signal e(t) due to various reasons in which the microphone 27 likely inputs noise in addition to the echo y(t) and the speech v(t) of the near-end talker and in which the property of the echo path EP is dynamically varied. In order to suppress the residual echo, the loss insertion unit 24 inserts a loss into the echo-canceled signal e(t) so as to adjust the gain with respect to the echo-canceled signal e(t). The gain controller 25 controls the gain of the echo-canceled signal e(t) in response to the speech v(t) of the near-end talker so as to prevent a loss from occurring in the echo-canceled signal e(t), wherein the gain is adjusted to “1”, for example. Thus, it is possible to perform conversation without a chopping of the near-end talker's speech.
The above technology is essentially designed to detect the speech v(t) of the near-end talker and to thereby stop the adaptive filter 21 updating the filter coefficient in a double-talk event, thus achieving appropriate echo cancellation. The first detection method is designed on the premise that the gain of the echo path EP is less than “1”, and the speech v(t) of the near-end talker is higher in level than the echo y(t). In actuality, such a premise is not normally established so that the first detection method suffers from erroneous detection of the speech v(t) of the near-end talker and degradation of communication quality due to erroneous detection. The second method needs an additional scheme for detecting variations of an echo path, which increases calculations and memory capacities and which thus results in complexity of the constitution of an echo canceler. In the detection method using the coherence of the echo y(t), it is necessary to accurately calculate a delay in an echo path, which in turn increases calculations. The detection method using the correlation of speeches cannot be adapted to the echo canceler 20 without the convergence of the filter coefficient of the adaptive filter 21.