In a long-distance telephone line via a submarine cable or via a communication satellite, the subscriber's line, in general, connected to both ends of the line is of a two-wire circuit and its long-distance transmission portion is of a four-wire circuit employed for amplification of signal and some other purposes. Similarly, in the mobile communications network using a mobile telephone (or cellular phone), the subscriber's line of a terrestrial analog telephone is of a two-wire circuit and its portion from a terminal of the mobile telephone to a switch, etc. is of a four-wire circuit. In this case, the connection region between the two-wire and the four-wire is provided with a hybrid circuit for performing a four-wire/two-wire conversion. This hybrid circuit is designed to match with the impedance of the two-wire circuit. However, since it is difficult to obtain always a good matching condition, a received signal reaching an input side of the four-wire of the hybrid circuit tends to leak toward an output side of the four-wire, thereby generating a so-called echo. Since such an echo reaches the talker at a lower sound level than the talker's voice and after a delay of a predetermined time period, a speech hindrance is created. Such a speech hindrance caused by echo becomes significant as the signal propagation time becomes longer. Particularly, in the case of a mobile communication with the aid of a mobile telephone, since various processing procedures are carried out in the radio communication section leading to the switch, etc., the delay of signal is increased, thus resulting, particularly, in the problem of speech hindrance caused by echo.
As an apparatus for preventing a generation of echo, there are an echo suppressor and an echo canceler. FIG. 1 shows a schematic construction of an echo canceler which can be used in a mobile communications network. The echo canceler 1 illustrated here is located on a front stage of a hybrid circuit 2. In this illustration, the subscriber of an analog telephone is referred to as the "near-end talker" and the subscriber of a mobile telephone as the "far-end talker". A far-end speech signal input into the echo canceler 1 is represented by Rin; a far-end speech signal output from the echo canceler 1, by Rout; a near-end speech signal input into the echo canceler 1, by Sin; and a near-end speech signal output from the echo canceler 1; by Sout, respectively.
The echo canceler 1 shown in FIG. 1 comprises an echo path estimation circuit/echo replica generator 3, a control unit 4, an adder 5, and a non-linear processor 6. Here, the echo path estimation circuit/echo replica generator 3 detects a response characteristic of the hybrid circuit 2 based on both the far-end speech input Rin and near-end speech input Sin and estimates an echo path (namely, echo propagating line). Then, an anticipated echo (namely, echo replica) from the hybrid circuit 2 is generated through a convolutional operation as a result of estimation and the far-end speech input Rin. In the adder 5, this echo replica is subtracted from the near-end speech input Sin, thereby canceling the echo. As the above-mentioned echo path estimation algorithm, a learning identification algorithm is used. Among many adaptive algorithms, this learning identification algorithm is comparatively small in computational complexity and good in convergence characteristic.
As shown in FIG. 1, the echo path estimation circuit/echo replica generator 3 includes an echo path estimation circuit 3a, an H-register 3b, and an echo replica generator 3c. In this case, the echo path estimation circuit 3a estimates an echo path using the learning identification algorithm which is, among many other adaptive algorithms, generally comparatively small in computational complexity and good in convergence characteristic, and writes a tap coefficient (as later described) corresponding to the estimated echo path in the H-register 3b. The echo replica generator 3c comprises an FIR adaptive digital filter. The generator 3c generates an echo replica using the tap coefficient in the H-register 3b and through a convolutional operation with the far-end speech input Rin. The learning identification algorithm is a known estimation algorithm as disclosed, for example, in Institute of Electronics and Communication Engineers of Japan (IECE) Journal '77/11 Vol. J60-A NO.11, article under the heading of "Regarding Echo Canceling Characteristic of Echo Canceler Using Learning Identification Algorithm" (written by: Itakura and Nishikawa). The outline of the learning identification algorithm discussed in this article will be briefly described.
Firstly, if the impulse response h(t) and input signal x(t) are used presuming that the signal propagation characteristic of the echo path is linear, the echo y.sub.k at the time kT (where T is a sampling interval) can be expressed as follows. EQU y.sub.k =h.sup.t x.sub.k (1)
where: EQU h=(h.sub.1, h.sub.2, . . . , h.sub.n), h.sub.j =h(.sub.j T) EQU x.sub.k =(x.sub.k-1, x.sub.k-2, . . . , x.sub.k-n).sup.t, x.sub.j =x(.sub.j T) (2)
(where .sup.t is transposition of vector) PA1 a first step of extracting a first feature from a first speech signal corresponding to a speech on the four-wire circuit side, the first feature being specified by a waveform of the first speech signal; PA1 a second step of extracting a second feature from a second speech signal corresponding to a speech on the two-wire circuit side, the second feature being specified by a waveform of the second speech signal; and PA1 a third step of comparing the first feature with the second feature and judging whether or not a double talk is present, based on a result of the comparison.
On the other hand, if the estimation value of h at the time kT is represented by H.sub.k (hereinafter referred to as the "tap coefficient"), an estimation value Y.sub.k of y.sub.k can be given by the following expression. EQU Y.sub.k =H.sub.k.sup.t x.sub.k (3)
Then, a successive correction of H.sub.k according to the learning identification algorithm is made by ##EQU1##
where: EQU e.sub.k =y.sub.k -Y.sub.k (5)
Namely, e.sub.k is a residual echo. This residual echo appears on the output side of the adder 5. As apparent from the above expression (5), the next tap coefficient H.sub.k+1 is calculated so that the residual echo will be reduced. Through calculation in the digital circuit, the above-mentioned algorithm can be specifically expressed as listed below. Firstly, the far-end speech input Rin, which is taken into the echo path estimation circuit 3a, is handled as a digital signal Xt (where t is a sampling time) having N pieces of sample values. EQU X.sub.t =(x(t), x(t-1), . . . , . . . x(t-(N-1)) (6)
If the tap coefficient H.sub.t at the time t written in the H-register 3b can be expressed as follows, EQU H.sub.t =(h.sub.t (0), h.sub.t (t), . . . , h.sub.t (N-1)) (7)
the convolutional operation in the echo replica generator 3c (FIR filter) can be expressed as follows. ##EQU2##
If the inner product of the vector is represented by "*" here, the above expression (8) can be rewritten as follows. EQU Y(t)=x.sub.t *H.sub.t (9)
Now, if the residual echo obtained on the output side of the adder 5 is represented by er(t), the following expression can be obtained. EQU er(t)=e(t)-Y(t) (10)
From the expressions so far listed, a fluctuation .DELTA.H.sub.t of H.sub.t can be expressed as follows. EQU .DELTA.H.sub.t =g.times.er(t).times.x.sub.t /(x.sub.t *X.sub.t) (11)
H.sub.t+1 can be expressed as follows. EQU H.sub.t+1 =H.sub.t +.DELTA.H.sub.t (12)
Therefore, the echo path estimation circuit 3a reads the tap coefficient H in the H-register 3b. By adding .DELTA.H.sub.t, which is calculated in the expression (11), to the tap coefficient H thus read, the echo path estimation circuit 3a, in turn, calculates the next tap coefficient H.sub.t+1 and writes it in the H-register 3b. In this way, the tap coefficients H in the H-register 3b are gradually renewed. What has been described so far is a specific computation in the digital circuit according to the learning identification algorithm. Also, the above expressions (6) to (12) are disclosed in Japanese Patent Laid-Open Application No. Hei 5-129989 and some others.
As conditions for enabling the above learning, the following requirements must be met.
1 A far-end speech output Rout of the level sufficient for an echo to come back as a near-end speech input Sin is present. In other words, the far-end taker is currently engaged in speech.
2 The near-end speech input Sin is constituted merely of an echo (or an echo and a white noise). In other words, the near-end taker is not engaged in speech.
On the other hand, when the far-end talker is in a speechless condition and when the far-end talker and the near-end talker are simultaneously engaged in speech (this state is hereinafter referred to as the "double talk"), it is necessary to turn off the learning function because there is a fear to cause a mis-learning state of echo path estimation.
In the transmission line, digital signals are transmitted, and a D/A conversion (in a general expression, a .mu.-LAW conversion) is made between the echo canceler 1 adapted to process such digital signals and the hybrid circuit 2 adapted to undertake a conversion to the analog line. For this reason, a non-linear characteristic relation is established between the far-end speech output Rout and the near-end speech input Sin. Therefore, echo cannot be canceled completely only through the linear computation by means of the echo path estimation circuit/echo replica generator 3, etc. As a consequence, an echo component unable to be completely canceled is produced. In order to remove such an echo component (hereinafter referred to as the "residual echo"), the non-linear processor 6 is employed. This non-linear processor 6 undertakes a non-linear switching operation. Specifically, in case the near-end speech output Sout is constituted merely of an echo, in other words, in case only the far-end talker is currently engaged in speech (this state is hereinafter referred to as the "far-end talker's single talk"), a switching operation is made such that the transmission of the near-end speech output Sout is prohibited or an operation is made such that the near-end speech output Sout is replaced by a pseudo noise.
The control unit 4 controls the echo path estimation circuit/echo replica generator 3 and the non-linear processor 6. That is, the control unit 4 detects the far-end taker's speechless condition or detects the double talk, controls the ON/OFF state of the learning function of the echo path estimation, detects the far-end talker's single talk, and controls the switching operation of the non-linear processor 6.
As a method for detecting the double talk carried out in the control unit 4, a power ratio of the far-end speech output Rout to the near-end speech input Sin is heretofore used, and if this ratio exceeds an expected echo level (for example, the maximum echo level 6 dB specified by CCITT standards), it is judged that the double talk has occurred. However, this conventional double talk detecting method has such a problem in that the detection is delayed. That is, in case there is no sufficient level difference at the beginning of generation of the double talk, the double talk is not detected and only when the level difference exceeds a predetermined value, the double talk is detected. As a consequence, a detection of the double talk is not performed at a good timing. Also, in the case where the speech levels of the far-end taker and near-end talker are greatly different, the double talk cannot be detected effectively.
Namely, in the case where the power for transmitting the far-end talker's speech is larger than the power for transmitting the near-end talker's speech, the ratio of the generated echo power to the transmitting power of the near-end talker's speech becomes small. In such a case, the difference between the power for transmitting echo and the power for transmitting the near-end talker's speech is reduced and therefore, it becomes difficult to smoothly distinguish the echo from the near-end talker's speech. As a consequence, it becomes difficult to detect the double talk accurately.
The low accuracy of the double talk detection causes a fear of mis-learning of the echo path estimation. Such a mis-learning not only deteriorates the function of echo cancellation but also to generate a wrong echo replica, thereby sending noises to the far-end taker, etc.