1. Field of the Invention
The present invention relates to a double-talk detector, and particularly to a double-talk detector applicable to, for example, an echo canceller using an adaptive filter for controlling, i.e. enabling/disabling, the update of the coefficient of the adaptive filter, and to such an echo canceller having the double-talk detector. The present invention also relates to a detection method for such a double-talk detector.
2. Description of the Background Art
Conventionally, a general echo canceller is shown in FIG. 1, to which reference will be made for describing the operation of the conventional echo canceller and a double-talk detector for use therein. The echo canceller includes a pseudo echo generator, a subtractor, and a double-talk detector.
The pseudo echo generator is an adaptive filter having learning function, and generates a pseudo echo signal yr(n) from a receiver input signal x(n) inputted from a receiver input terminal Rin. The pseudo echo generator deals with a residual signal e(n) outputted from a residual output terminal RES as an error caused by a difference in characteristics between the adaptive filter and the echo path transfer to update the coefficient of the adaptive filter so as for the characteristic of the adaptive filter to converge to the characteristic of the echo path transfer.
To the receiver input terminal Rin of the echo canceller, the receiver input signal x(n) is transmitted from a far end talker to be inputted. The receiver input signal x(n) is outputted from a receiver output terminal Rout toward a near end side, and also inputted to the adaptive filter in the pseudo echo generator.
The receiver output terminal Rout outputs the receiver input signal x(n), which is sent to a near end talker through an equivalent circuit of a hybrid circuit having a two-to-four-wire conversion function. In addition, the hybrid circuit includes an echo path, which generates an echo y(n), and outputs this echo to one terminal of an adder. The adder has its other terminal receiving a near end transmitter output signal t(n) transferred from a near end talker. The adder outputs a transmitter input signal d(n) obtained by adding the echo signal component y(n) to the near end transmitter output signal t(n) to a transmitter input terminal Sin of the echo canceller. This relation is represented by an expression (1):d(n)=y(n)+t(n)  (1)
The subtractor subtracts the pseudo echo signal yr(n) from the transmitter input signal d(n) to output the residual signal e(n). This is represented by an expression (2):e(n)=d(n)−yr(n)  (2)
The subtractor outputs the residual signal e(n). The residual signal includes a residual echo Δy(n) caused by a difference in characteristics between the adaptive filter and the echo path transfer, and the near end transmitter output signal t(n). The residual echo Δy(n) is represented by an expression (3):Δy(n)=y(n)−yr(n)  (3)
Now, the expression (1) is substituted for the expression (2) to obtain an expression (4):
                                                                        e                ⁡                                  (                  n                  )                                            =                            ⁢                                                d                  ⁡                                      (                    n                    )                                                  -                                  y                  ⁢                                                                          ⁢                                      r                    ⁡                                          (                      n                      )                                                                                                                                              =                            ⁢                                                y                  ⁡                                      (                    n                    )                                                  +                                  t                  ⁡                                      (                    n                    )                                                  -                                  y                  ⁢                                                                          ⁢                                      r                    ⁡                                          (                      n                      )                                                                                                                                              =                            ⁢                                                (                                                            y                      ⁡                                              (                        n                        )                                                              -                                          y                      ⁢                                                                                          ⁢                                              r                        ⁡                                                  (                          n                          )                                                                                                      )                                +                                  t                  ⁡                                      (                    n                    )                                                                                                          (        4        )            
Furthermore, the expression (3) is substituted for the expression (4) to obtain an expression (5):e(n)=Δy(n)+t(n)  (5)
When the receiver input signal x(n) is null, namely x(n)=0, the signal component of the residual signal e(n) includes the near end transmitter output signal t(n) only. Therefore, the adaptive filter needs to disable the update of its adaptive filter coefficient to control itself so as to prevent the coefficient from diverging.
When the receiver input signal x(n) includes an audio or voice signal component, and the near end transmitter output signal t(n) is null, namely t(n)=0, the system is in a single-talk state. At this time, the signal component of the residual signal e(n) includes the residual echo Δy(n) only. Therefore, this residual signal e(n) can be considered as an error caused by a difference in characteristics between the adaptive filter and the echo path transfer to preferably update the coefficient of the adaptive filter. This update causes the characteristic of the adaptive filter to converge to the characteristic of the echo path transfer. The adaptive filter in the single-talk state may thus preferably enable the update of its coefficient.
When both the receiver input signal x(n) and the near end transmitter output signal t(n) include an audio or voice signal component, the system is in a double-talk state. At this time, the residual signal e(n) includes the residual echo Δy(n) caused by a difference in characteristics between the adaptive filter and the echo path transfer, as well as the near end transmitter output signal t(n). Therefore, the adaptive filter needs to disable the update of its coefficient to control itself so as to prevent the coefficient of the adaptive filter from diverging.
Thus, the double-talk detector monitors the receiver input signal x(n), the transmitter input signal d(n), and the residual signal e(n) to determine a talk state. Thereby, during a null state of receiver input signal x(n) or the double-talk state, the adaptive filter disables the update of its coefficient, and in the single-talk state the adaptive filter enables the update of its coefficient.
A conventional double-talk detection method is disclosed by Japanese Patent Laid-Open Publication No. 238727/1988. This method calculates an echo attenuation value Acoms(n) by subtracting a decibel value corresponding to the signal power of the residual signal e(n) from that corresponding to the signal power of the receiver input signal x(n).
Subsequently, according to a talk state, the echo attenuation value Acoms(n) or a value obtained by adding the echo attenuation value Acoms(n) to a margin γ is integrated to calculate a double-talk detection threshold value TRIM(n). Then, when a value obtained by adding the echo attenuation value Acoms(n) to the margin γ is lower than the double-talk detection threshold value TRIM(n), this method determines the double-talk state. This method, since the near end transmitter output signal t(n) transmitted from a near end talker has an audio signal to increase the signal power of the residual signal e(n)=Δy(n)+t(n) and to decrease the echo attenuation value Acoms(n), detects this decrease in the echo attenuation value Acoms(n) to determine the double-talk state.
Next, the double-talk detection method disclosed by the Japanese '727 Publication will be described briefly. When the signal power of the receiver input signal x(n) is lower than a threshold value XTH to have no audio signal, this method disables the update of coefficient of the adaptive filter, and keep the double-talk detection threshold value TRIM(n) holding the preceding value.
In addition, when the signal power of the receiver input signal x(n) is equal to or more than the threshold value XTH to have an audio signal, and the signal power of the transmitter input signal d(n) is lower than a threshold value YTH to have no audio signal, this method enables the update of coefficient of the adaptive filter, and rapidly decrease the double-talk detection threshold value TRIM(n).
When the signal power of the receiver input signal x(n) is equal to or higher than the threshold value XTH to have an audio signal, and the signal power of the transmitter input signal d(n) is equal to or higher than the threshold value YTH to have an audio signal also, the echo attenuation value Acoms(n) is added to the margin γ to compare the resulting value with the double-talk detection threshold value TRIM(n).
Then, when a value obtained by adding the echo attenuation value Acoms(n) to the margin γ is larger than the double-talk detection threshold value TRIM(n), this method determines a single-talk state to enable the update of coefficient of the adaptive filter and to update the double-talk detection threshold value TRIM(n). The double-talk detection threshold value TRIM(n) is updated through an expression (6):TRIM(n+1)=Acoms(n)×δ1+TRIM(n)×(1−δ1),  (6)where TRIM(n) is a double-talk detection threshold value, Acoms(n) is an echo attenuation value, and δ1 is a coefficient defining the transient response of an integral process. Therefore, when the single-talk state continues, the double-talk detection threshold value TRIM(n) is rendered equal to a value obtained by integrating the echo attenuation value Acoms(n).
In addition, when the value obtained by adding the echo attenuation value Acoms(n) to the margin γ is lower than the double-talk detection threshold value TRIM(n), this method determines a double-talk state to disable the update of coefficient of the adaptive filter and to update the double-talk detection threshold value TRIM(n). The double-talk detection threshold value TRIM(n) is updated through an expression (7):
                                                                        TRIM                ⁡                                  (                                      n                    +                    1                                    )                                            =                            ⁢                                                TRIM                  ⁡                                      (                    n                    )                                                  -                                                      {                                                                  TRIM                        ⁡                                                  (                          n                          )                                                                    -                                              FLG                        ⁡                                                  (                          n                          )                                                                                      }                                    ×                  δ                  ⁢                                                                          ⁢                  2                                                                                                        =                            ⁢                                                TRIM                  ⁡                                      (                    n                    )                                                  -                                                      TRIM                    ⁡                                          (                      n                      )                                                        ×                  δ                  ⁢                                                                          ⁢                  2                                +                                                      FLG                    ⁡                                          (                      n                      )                                                        ×                  δ                  ⁢                                                                          ⁢                  2                                                                                                        =                            ⁢                                                                    FLG                    ⁡                                          (                      n                      )                                                        ×                  δ                  ⁢                                                                          ⁢                  2                                +                                                      TRIM                    ⁡                                          (                      n                      )                                                        ×                                      (                                          1                      -                                              δ                        ⁢                                                                                                  ⁢                        2                                                              )                                                                                                                                          =                                ⁢                                                                            (                                                                        Acoms                          ⁡                                                      (                            n                            )                                                                          +                        γ                                            )                                        ×                    δ                    ⁢                                                                                  ⁢                    2                                    +                                                            TRIM                      ⁡                                              (                        n                        )                                                              ×                                          (                                              1                        -                                                  δ                          ⁢                                                                                                          ⁢                          2                                                                    )                                                                                  ,                                                          (        7        )            where TRIM(n) is a double-talk detection threshold value, Acoms(n) is an echo attenuation value, δ2 is a coefficient defining the transient response of an integral process, γ is a margin, and FLG(n) is equal to the Acoms(n) plus the margin γ. Therefore, when the double-talk state continues, the double-talk detection threshold value TRIM(n) is rendered equal to a value obtained by adding the echo attenuation value Acoms(n) to the margin γ to integrate the resulting value.
As described above, the conventional double-talk detection method, when the signal power of the receiver input signal x(n) is equal to or higher than the threshold value XTH to have an audio signal, and the signal power of the transmitter input signal d(n) is equal to or higher than the threshold value YTH to have an audio signal, a decibel value corresponding to the signal power of the residual signal e(n) is subtracted from a decibel value corresponding to the signal power of the receiver input signal x(n) to obtain the echo attenuation value Acoms(n), which is processed to detect a double-talk detection.
However, the above-described echo attenuation value Acoms(n) may be changed depending on a talk state, the power of the near end background noise, and the degree of convergence of the adaptive filter, without changing in an echo path transfer characteristic. Therefore, the conventional double-talk detection is affected by a talk state, the power of the near end background noise, and the degree of convergence of the adaptive filter, which is problematic.
Now, affection of a talk state, the power of the near end background noise, and the degree of convergence of the adaptive filter to the conventional double-talk detection will be described below.
(1) Affection of a Talk State to the Double-Talk Detection
The double-talk state has a residual signal e(n)(=Δy(n)+t(n)) including a near end transmitter output signal t(n). Therefore, the double-talk state has the increasing signal power of the residual signal e(n) and the decreasing echo attenuation value Acoms(n). At this time, as described in connection with the expression (7), since a value obtained by adding the echo attenuation value Acoms(n) to the margin γ is integrated, the double-talk detection threshold value TRIM(n) decreases.
The double-talk detection threshold value TRIM(n) is smaller in the double-talk state than that before the double-talk state.
(2) Affection of the Power of the Near End Background Noise to the Double-Talk Detection
Even when only a far end talker speaks in the single-talk state, a near end transmitter output signal t(n) transmitted from the near end generally is not completely null but includes a small signal component, or the near end background noise. Therefore, the transmitter input signal d(n) and the residual signal e(n) (=Δy(n)+t(n)) also include the near end background noise.
For example, it is assumed that the adaptive filter adequately eliminates an echo, namely Δy(n)≈0. At this time, when the near end background noise of the near end transmitter output signal t(n) is almost null, the residual signal e(n) (=Δy(n)+t(n)) is almost null so that the echo attenuation value Acoms(n) and the double-talk detection threshold value TRIM(n) increase.
By contrast, when the power of near end background noise of the near end transmitter output signal t(n) increases, the signal power of the residual signal e(n)(=Δy(n)+t(n)) increases so that the echo attenuation value Acoms(n) and the double-talk detection threshold value TRIM(n) decrease.
In other words, when the near end background noise of the near end transmitter output signal t(n) increases, the double-talk detection threshold value TRIM(n) decreases.
(3) Affection of the Degree of Convergence of the Adaptive Filter to the Double-Talk Detection
Depending on the degree of convergence of the adaptive filter, the signal power of the residual signal e(n) and the echo attenuation value Acoms(n) change.
It is assumed that, when the near end background noise is almost null and only the far end talker speaks with the coefficient of the adaptive filter converging, the near end transmitter output signal t(n) transmitted from the near end talker has an impulse signal involved. Then, it is also assumed that the coefficient of the adaptive filter, once converging, diverges.
In this state, since the coefficient of the adaptive filter has converged before applying the impulse signal, the residual echo Δy(n) is almost equal to zero. The signal power of the residual signal e(n)(=Δy(n)+t(n)) decreases, and the echo attenuation value Acoms(n) increases. This results in increasing the double-talk detection threshold value TRIM(n).
On the other hand, right after applying the impulse signal, with the divergence of coefficient of the adaptive filter, the signal power of the residual echo Δy(n) increases rapidly, the signal power of the residual signal e(n) increases, and the echo attenuation value Acoms(n) decreases. However, the double-talk detection threshold value TRIM(n), since calculated by an integral process, hardly changes and keeps its large value.
In this case, whereas the single-talk state stays in fact immediately after applying the impulse signal, a double-talk state may be erroneously determined. When determined to be a double-talk wrongly, the double-talk detection threshold value TRIM(n) gradually decreases. Over time, a single-talk is determined though. However, in spite of the single-talk state and the adaptive filter coefficient divergent, the update of coefficient of the adaptive filter is kept stopping for a long period of time. During this period of time, the residual echo Δy(n) is left to be large, which may cause the far end talker to hear the echo.
As described above, the conventional double-talk detector is affected by a talk state, the power of the near end background noise, and the degree of convergence of the adaptive filter to decrease the accuracy of double-talk detection, which is problematic.