Several kinds of interfering signals superposed over a target signal include a line echo generated in a two-wire-to-four-wire converter circuit in a communication line, an acoustic echo generated by acoustic coupling between a speaker for reproducing acoustic signals and a microphone, a background noise or voice of other people getting into a microphone for catching a target signal.
In a two-wire-to-four-wire converter circuit, there is a known technique for removing an echo leaking from a transmitter to a receiver on the four-wire side, such as for example, an echo canceller as described in Non-patent Document 1. The echo canceller is operated to suppress an echo leaking from a transmitter circuit to a receiver circuit on the four-wire side in a two-wire-to-four-wire converter circuit by using an adaptive filter having a number of tap coefficients, which number is equal to or more than the length of an impulse response of an echo path, to generate a pseudo echo (echo replica) corresponding to a transmitted signal.
On the similar principle, another technique is known for removing an acoustic echo generated by acoustic coupling between a speaker for reproducing an acoustic signal and a microphone, such as an acoustic echo canceller as described in Non-patent Document 2. The acoustic echo canceller is operated to suppress an echo leaking from a speaker to a microphone due to acoustic coupling between the speaker and microphone by using an adaptive filter having a number of tap coefficients, which number is equal to or more than the length of an impulse response of an echo path, to generate a pseudo echo (echo replica) corresponding to a transmitted signal.
In such echo cancellers, the tap coefficients of the adaptive filter are modified by correlating a transmitted signal with an error signal obtained by subtracting a pseudo echo from a mixed signal containing an echo and a received signal together. Typical and commonly used algorithms for modifying coefficients of an adaptive filter are an LMS algorithm described in Non-patent Document 1, and a normalized LMS (NLMS) algorithm described in Non-patent Document 3.
FIG. 12 is a block diagram showing an exemplary configuration of a conventional acoustic echo canceller. A reference signal x(k) supplied to an input terminal 1 is transmitted to a speaker 2, where it is emitted as an acoustic signal into an acoustic space. The symbol k is a subscript denoting a time. A microphone 3, which is for catching a near-end acoustic signal v(k), also catches an echo y(k) generated from the acoustic signal emitted by the speaker 2, and transmits it to a subtractor 6.
The reference signal x(k) is also supplied to an adaptive filter 5, which outputs a pseudo echo y(k) hat. This y(k) hat is supplied to the subtractor 6 to subtract it from the signal supplied by the microphone 3, yielding an echo-free signal e(k):e(k)=v(k)+y(k)−y(k)hat.  (1)The value e(k) obtained by the equation above is transmitted to an output terminal 4 as an output. In EQ. (1), y(k)−y(k) hat is called a residual echo.
Assuming the aforementioned LMS algorithm, an m-th coefficient wm(k) of the adaptive filter 5 is updated according to:wm(k+1)=wm(k)+μ·e(k)·xm(k).  (2)EQ. (2) can be rewritten for all N coefficients in a matrix form as:W(k+1)=W(k)+μ·e(k)·X(k),  (3)where W(k) and X(k) are given by:W(k)=[w0(k)w1(k) . . . wN-1(k)]T, and  (4)X(k)=[x0(k)x1(k) . . . xN-1(k)]T.  (5)
A coefficient updating circuit 7 calculates the second term on the right-hand side of EQ. (2) on receipt of the reference signal x(k) and echo-free signal e(k). The adaptive filter 5 updates, coefficients on receipt of the second term on the right-hand side of EQ. (2) supplied by the coefficient updating circuit 7. On the other hand, the NLMS algorithm updates coefficients according to EQ. (6) below, instead of EQ. (3):W(k+1)=W(k)+(μ/Nσx2)·e(k)·X(k),  (6)where σx2 is an average electric power of the reference signal x(k) input to the adaptive filter 5. Nσx2 is used for achieving stable convergence by making the value of the step size μ inversely proportional to the average electric power. There are several methods for calculating Nσx2, and one of them involves adding all x2(k) for N preceding samples, for example.
As given by EQ. (1), the echo-free signal e(k) contains a residual echo y(k)−y(k) hat required in updating coefficients, and in addition to that, a near-end voice signal v(k). The signal v(k) acts as a signal interfering with coefficient update, and may sometimes lead to failure in coefficient update if it is unignorable relative to the residual echo. Thus, in general, a double-talk detector circuit 8 is used to detect the presence of the near-end voice v(k), and a result of the detection is used to control coefficient update. The output of the double-talk detector circuit 8 is transmitted to a switch 9, which opens a circuit from the coefficient updating circuit 7 to the adaptive filter 5 if a double talk is detected (i.e., a near-end voice is present), thereby temporarily stopping coefficient update.
A first conventional technique of double-talk detection is disclosed in Patent Document 1. The first conventional technique detects a double talk by level comparison between a microphone signal and a reference signal if the amount of echo cancellation calculated from the microphone signal and an error signal is smaller than a first threshold, and detects a double talk using a cross-correlation between the reference signal and microphone signal if the amount is greater than the first threshold. However, it is not easy to select an appropriate threshold in advance for all cases.
A second conventional technique is disclosed in Patent Document 2. The second conventional technique detects a double talk using an auto-correlation of an error signal and an auto-correlation of a reference signal. In this configuration, the echo canceller itself is multiplexed to make power comparison between a plurality of error signals corresponding to a plurality of adaptive filter outputs. Thus, a plurality of adaptive filters are required, thus increasing computational complexity.
A third conventional technique is disclosed in Patent Document 3. The third conventional technique requires a plurality of sets of adaptive filter coefficients, thus raising a problem that a required memory size is increased.
A fourth conventional technique is disclosed in Patent Document 4. The fourth conventional technique detects a double talk and system variation undiscriminatingly by comparing, with a threshold, a power ratio between an error and a reference signal, a power ratio between a microphone signal and a reference signal, or a power ratio between an error and a pseudo echo, and further detects a double talk by comparing, with a threshold, a value obtained by normalizing a correlation between the error and pseudo echo by a power of the pseudo echo.
A fifth conventional technique is disclosed in Patent Document 5. The fifth conventional technique involves double-talk detection using a correlation or covariance of signals caught by a plurality of microphones. Therefore, this technique requires a plurality of microphones and is not applicable to a system comprising a single microphone.
A sixth conventional technique is disclosed in Patent Document 6. The sixth conventional technique conducts double-talk detection using a differential power between a reference signal and a microphone signal. Since in a general acoustic system, however, an echo path gain is not known, difficulty is encountered in selecting a detection threshold.
A seventh conventional technique is disclosed in Patent Document 7. The seventh conventional technique conducts double-talk detection by comparing, with a threshold, a ratio between a cross-correlation of a microphone signal with a pseudo echo, and an auto-correlation of the pseudo echo. Since the microphone signal contains a background noise, the threshold should be selected as appropriate according to the nature of the background noise. Therefore, difficulty is encountered in selecting a detection threshold.
An eighth conventional technique is disclosed in Patent Document 8. The eighth conventional technique conducts double-talk detection using a cross-correlation about a variation in an analysis parameter for a reference signal and a microphone signal. Since the analysis parameter for a reference signal and a microphone signal should be found, there arises a problem that computational complexity is increased.
A ninth conventional technique is disclosed in Patent Document 9. The ninth conventional technique conducts double-talk detection using the frequency of saturation and the power of an error, and difficulty is encountered in selecting a threshold for saturation.
A tenth conventional technique is disclosed in Patent Document 10. The tenth conventional technique detects a double talk by comparing, with a threshold, a value of a power ratio between a reference signal and a microphone signal, plus a margin. Thus, detection performance is dependent upon the margin, which is difficult to determine.
Eleventh and twelfth conventional techniques are disclosed in Patent Documents 11 and 12, respectively. Both these conventional techniques employ two microphones, and are not applicable to a system comprising a single microphone.
A thirteenth conventional technique is disclosed in Patent Document 13. The thirteenth conventional technique detects a double talk by comparing, with a threshold, a value of a determinant defined using an auto-correlation of a microphone signal, an auto-correlation of a pseudo echo, and their cross-correlation. The value of the determinant, however, is variable depending upon an environment, resulting in difficulty in selecting the threshold.
An exemplary technique of double-talk detection using a normalized cross-correlation vector of a reference signal and a microphone signal is disclosed in Non-patent Document 4.
In Non-patent Document 4, double-talk detection is conducted using a normalized cross-correlation vector cxm of a reference signal x(k) and a microphone signal m(k) as follows:[Equation 1]cxm(k)=(σm2Rxx)−0.5rxm,  (7)where σm2 designates a variance of m(k), rxm=Rxxh designates a cross-correlation of x(k) and m(k), Rxx=E[X(k)XT(k)] designates an auto-correlation matrix of the reference signal x(k), E[⋅] designates an operator representing a mathematical expectation, and h designates an impulse response of an acoustic path from the speaker 2 to the microphone 3 given as follows:h(k)=[h0 h1 . . . hN-1]T.  (8)It should be noted that a near-end voice contained in a microphone signal is assumed to have no correlation with a reference signal, and a background noise is assumed to have no correlation with the reference signal.
A decision variable ξ for double-talk detection is given using |cxm| and paying attention to the fact that σm2 is a scalar, as follows:
                    [                  Equation          ⁢                                          ⁢          2                ]                                                            ξ        =                                                                                              c                  xm                                                            2                                =                                                                                                                r                      xm                      T                                        ⁡                                          (                                                                        σ                          m                          2                                                ⁢                                                  R                          xx                                                                    )                                                                            -                    1                                                  ⁢                                  r                  xm                                                      =                                                                                r                    xm                    T                                    ⁢                                      R                    xx                                          -                      1                                                        ⁢                                      r                    xm                                                                    σ                  m                  2                                                                                        (        9        )            A double-talk is decided when ξ is smaller than one.    Patent Document 1: Japanese Patent Application Laid Open No H3-218150    Patent Document 2: Japanese Patent Application Laid Open No. H6-13940    Patent Document 3: Japanese Patent Application Laid Open No. H6-14100    Patent Document 4: Japanese Patent Application Laid Open No. H7-226793    Patent Document 5: Japanese Patent Application Laid Open No. H7-250397    Patent Document 6: Japanese Patent Application Laid Open No, H7-264103    Patent Document 7: Japanese Patent Application Laid Open No. H7-288493 Patent Document 8: Japanese Patent Application Laid Open No. H7-303070    Patent Document 9: Japanese Patent Application Laid Open No. H10-41858    Patent Document 10: Japanese Patent Application Laid Open No. H11-215033    Patent Document 11: Japanese Patent Application Laid Open No. 2000-324233    Patent Document 12: Japanese Patent Application Laid Open No. 2004-40161    Patent Document 13: Japanese Patent Application Laid Open No. 2004-517579    Non-patent Document 1: Adaptive Signal Processing, 1985, Prentice-Hall Inc., U.S.A.    Non-patent Document 2: “Acoustic Echo Control,” IEEE Signal Processing Magazine, pp. 42-69, July 1999.    Non-patent Document 3: Adaptive Filters, 1985, Kulwer Academic Publishers, U.S.A.    Non-patent Document 4: IEEE Transactions on Speech and Audio Processing, pp. 168-172, March 2000.