1. Field of the Invention
The present invention relates to an echo cancelling technology used in a communication device. In particular, the present invention relates to the echo cancelling technology in a double talk state.
2. Description of the Related Art
Recently, mobile phones having a videophone function have come on the market, and the number of users who talk in a hands free talk mode has increased upon using the videophone function. In the hands free talk mode, a sound output level from a speaker may be so high that a sound from the speaker may be received by a microphone. This phenomenon is called as “sneaking”.
As illustrated in FIG. 1, when the sneaking occurs on a far end talker side, a phenomenon that a voice of a near end talker is heard from a speaker on a near end talker side after a while occurs (that is called an “echo”). Such an echo is a phenomenon unpleasant to the near end talker. In addition, as illustrated in FIG. 2, when the sneaking occurs on both the far end talker side and the near end talker side, an acoustic closed loop may be formed. As a loop gain increases, an oscillation may be generated so that a phenomenon of generating a large sound like “boom” (so-called “howling sound”) will occur. The howling sound is also an unpleasant phenomenon, and the user on both ends will have no other choice but to stop talking. In order to suppress such an echo or a howling sound as described above, an echo canceller is used.
FIG. 3 illustrates a structure of the echo canceller described in “Technology of Digital Audio” written and edited by Nobuhiko Kitawaki, issued by the Telecommunications Association, distributed by Ohmsha, Ltd. ISBN4-88549-905-4. A signal transmitted from a mobile terminal on the near end talker side to a mobile terminal on the far end talker side is a transmitting signal e(k). On the contrary, a signal received by the mobile terminal on the near end talker side from the mobile terminal on the far end talker side is a receiving signal x(k). The receiving signal x(k) is delivered from a speaker of the mobile terminal on the near end talker side. In addition, it is supposed that the near end talker is performing a hands free talk. Therefore, an echo signal y(k) is generated by sneaking of the receiving signal x(k) delivered from the speaker, and is received by a microphone of the mobile terminal on the near end talker side. This echo signal y(k) is expressed by Equation (1).y(k)=h(k)×x(k)  Equation (1)
The parameter h(k) in Equation (1) is a conversion coefficient from the receiving signal x(k) into the echo signal y(k). In other words, the conversion coefficient h(k) indicates a transmission characteristic of an acoustic echo path from the speaker to the microphone, which depends on an environment in which the mobile terminal on the near end talker side is placed. In addition to the echo signal y(k) described above, a voice signal v(k) of the near end talker and an ambient noise signal n(k) are also received by the microphone of the mobile terminal on the near end talker side. In other words, an input signal yin(k) received by the microphone of the mobile terminal on the near end talker side is expressed by Equation (2).yin(k)=v(k)+n(k)+y(k)  Equation (2)
Note that k indicates time as to the parameters described above. The same is true in the following description.
The echo canceller illustrated in FIG. 3 includes an adaptive filter and a subtractor so as to cancel the echo signal y(k). First, the adaptive filter synthesizes a spurious echo signal y′(k) from the receiving signal x(k) based on the NLMS algorithm. This spurious echo signal y′(k) is an echo signal estimated by the adaptive filter and is expressed by Equation (3).y′(k)=h′(k)×x(k)  Equation (3)
The parameter h′(k) in Equation (3) is a conversion coefficient from the receiving signal x(k) into the spurious echo signal y′(k). In other words, the conversion coefficient h′(k) indicates a transmission characteristic of the acoustic echo path from the speaker to the microphone, which is estimated by the adaptive filter. The adaptive filter delivers the obtained spurious echo signal y′(k) to the subtractor.
The subtractor receives the input signal yin(k) from the microphone. Then, the subtractor generates the transmitting signal e(k) by subtracting the above-mentioned spurious echo signal y′(k) from the received input signal yin(k). The transmitting signal e(k) generated by the subtractor is expressed by Equation (4).e(k)=yin(k)−y′(k)=v(k)+n(k)+y(k)−y′(k)  Equation (4)
The adaptive filter illustrated in FIG. 3 performs feedback control based on the transmitting signal e(k). Specifically, the adaptive filter updates the above-mentioned conversion coefficient h′(k) so that the transmitting signal e(k) becomes zero. Here, it is supposed that the near end talker is not talking so that the voice signal v(k) of the near end talker is zero. In addition, it is supposed that a level of the ambient noise signal n(k) can be neglected. In this case, the transmitting signal e(k) generated by the subtractor is expressed by Equation (5).e(k)=y(k)−y′(k)  Equation (5)
The adaptive filter updates the above-mentioned conversion coefficient h′(k) so that the transmitting signal e(k) expressed by Equation (5) becomes zero. In other words, the adaptive filter estimates the transmission characteristic h(k) of the acoustic echo path from the speaker to the microphone so that the echo signal y(k) received by the microphone is cancelled. The transmitting signal e(k) expressed by Equation (5) is an estimated error, and it can be said that the adaptive filter performs the feedback control so that the estimated error e(k) becomes zero. When the conversion coefficient h′(k) of the adaptive filter matches the transmission characteristic h(k) of the acoustic echo path, the spurious echo signal y′(k) agrees with the actual echo signal y(k), whereby echo cancellation is normally performed.
In this way, if the near end talker is not talking but only the far end talker is talking, the echo cancellation is normally performed. Actually, however, there often occurs the situation in which not only the far end talker but also the near end talker is talking simultaneously (hereinafter referred to as “double talk state”). In the double talk state, the transmitting signal e(k) generated by the subtractor is expressed by Equation (4). Even if the ambient noise signal n(k) can be neglected, the voice signal v(k) of the near end talker cannot be neglected. In this case, the adaptive filter performs the feedback control so that the transmitting signal e(k) expressed by Equation (4) becomes zero, and hence it is impossible to remove only the echo signal y(k) normally. In other words, the adaptive filter misestimates the transmission characteristic h(k) due to a disturbance other than the echo signal y(k) received by the microphone, with the result that the performance of the echo cancellation is deteriorated significantly.
As described above, the echo canceller illustrated in FIG. 3 becomes unstable with respect to a disturbance, and particularly in the double talk state, the performance of the echo cancellation is deteriorated significantly. A related technology for a purpose of solving the above-mentioned problem is described in Japanese Patent Application Laid-open No. 2002-76999.
FIG. 4 illustrates a structure of the echo canceller described in Japanese Patent Application Laid-open No. 2002-76999. The echo canceller illustrated in FIG. 4 includes a power estimation circuit 103, a step size decision circuit 104, a noise level estimation circuit 106, and a near end voice level estimation circuit 107 in addition to an adaptive filter 101 and a subtractor 102.
Similarly to the case of FIG. 3, the adaptive filter 101 synthesizes the spurious echo signal y′(k) from the receiving signal x(k), and the subtractor 102 subtracts the spurious echo signal y′(k) from the input signal yin(k) so as to generate the transmitting signal e(k). The transmitting signal e(k) is the same as that expressed by Equation (4). On the other hand, the power estimation circuit 103 estimates power of the receiving signal x(k) based on the receiving signal x(k) from the far end talker. In addition, the noise level estimation circuit 106 and the near end voice level estimation circuit 107 respectively estimate levels of the ambient noise signal n(k) and the voice signal v(k) based on the transmitting signal e(k).
The step size decision circuit 104 decides a step size based on the estimated power of the receiving signal x(k), the estimated level of the ambient noise signal n(k) and the estimated level of the voice signal v(k). The step size means an update quantity of the conversion coefficient h′(k) in the adaptive filter 101. For instance, when the estimated level of the voice signal v(k) or the ambient noise signal n(k) is relatively small, i.e., when it is determined that the input signal yin(k) received by the microphone is mainly the echo signal y(k), the step size decision circuit 104 sets the step size to be relatively large. On the other hand, when the estimated level of the voice signal v(k) or the ambient noise signal n(k) is relatively large, i.e., when it is determined that the disturbance received by the microphone is large, the step size decision circuit 104 sets the step size to be relatively small. The step size (update quantity) decided in this way is supplied to the adaptive filter 101 together with the transmitting signal e(k).
The adaptive filter 101 updates the conversion coefficient h′(k) so that the transmitting signal e(k) becomes zero. On this occasion, the adaptive filter 101 updates the conversion coefficient h′(k) in accordance with the step size decided by the step size decision circuit 104. In other words, when the estimated level of the voice signal v(k) or the ambient noise signal n(k) is relatively small, the adaptive filter 101 updates the conversion coefficient h′(k) by the large step. On the other hand, when the estimated level of the voice signal v(k) or the ambient noise signal n(k) is relatively large, the adaptive filter 101 updates the conversion coefficient h′(k) by the small step.
In this way, the echo canceller illustrated in FIG. 4 estimates the level of the voice signal v(k) and the ambient noise signal n(k) which are contained in the input signal yin(k), and sets the update quantity of the conversion coefficient h′(k) to be variable in accordance with the situation. Thus, it is possible to suppress the transmission characteristic h(k) from being largely misestimated by the adaptive filter 101, whereby the stability with respect to the disturbance is improved. In other words, it is possible to suppress the echo cancellation performance from being significantly deteriorated.
Japanese Patent Application Laid-open No. 2008-141734 discloses an echo canceller that is used for a loudspeaker call system for performing a loudspeaker call using a speaker and a microphone. The echo canceller includes an adaptive filter portion and an echo suppressing portion. The adaptive filter portion identifies an impulse response of a feedback path constituted of an acoustic connection between the speaker and the microphone, in an adaptive manner, and estimates an echo component of the feedback path based on an input signal supplied to the feedback path. Further, the adaptive filter portion subtracts the estimated echo component from a microphone input signal supplied from the feedback path. The echo suppressing portion performs an echo suppressing process on an echo cancellation output signal delivered from the adaptive filter portion. Specifically, the echo suppressing portion determines an echo suppressing quantity based on a Wiener filtering method by using an echo reducing quantity that is defined based on a ratio between the above-mentioned microphone input signal and a voice signal on the near end side which mixes in the feedback path. Then, the echo suppressing portion multiplies the echo suppressing quantity and the echo cancellation output signal delivered from the adaptive filter portion together.
In the echo canceller illustrated in FIG. 4, the step size is decided based on three parameters including the estimated power of the receiving signal x(k), the estimated level of the ambient noise signal n(k), and the estimated level of the voice signal v(k). However, it is difficult to decide an appropriate step size by such a method as described above. It is because that the power of the receiving signal x(k), the levels of the ambient noise signal n(k), and the voice signal v(k) vary largely in accordance with a telephone environment.