1. Field of the Invention
The present invention relates to a multichannel echo canceller and more particularly, to a multichannel echo canceller which is used in a conferencing system, a handsfree telephone, or the like.
2. Description of the Background Art
In recent years, a multichannel acoustic system, such as a conferencing system and a handsfree telephone, in which acoustic signals which are voices of speakers in remote locations are transmitted in an interactive manner has been realized. In a case where this kind of an acoustic system is realized, for example, between a first location and a second location, a plurality of microphones for detecting the voices of speakers and a plurality of loudspeakers with which the voices of speakers in remote locations are listened are provided in the first location and the second location, respectively. The loudspeakers in the first location are connected to the microphones in the second location and the microphones in the first location are connected to the loudspeakers in the second location. Thus, for example, a speaker S1 who is present in the first location can listen to, through the loudspeakers in the first location, a voice of a speaker S2 who is present in the second location. The speaker S1 can also deliver his or her voice, through the microphones in the first location, to the speaker S2.
However, such an acoustic system has a problem that it is required to cancel an echo. For example, when the speaker S2 outputs a voice, a sound of the voice is amplified via the microphones in the second location by the loudspeakers in the first location. Here, the microphones are provided in the first location. Therefore, the sound of the voice of the speaker S2, which has been amplified by the loudspeakers in the first location, is detected by the microphones in the first location. As a result, the speaker S2, when listening to a voice of the speaker S1, also hear the voice of himself or herself through the loudspeakers in the second location. As described above, a voice outputted by a speaker, which has been amplified by the loudspeakers for listening to a voice of another speaker who is present in a remote location, results in an echo which is unwanted for the speaker outputting the voice.
Therefore, as a multichannel echo canceller for canceling such an echo, a multichannel echo canceller using adaptive filters has conventionally been proposed. FIG. 8 is a diagram illustrating a configuration of a conventional multichannel echo canceller 9 using the adaptive filter, which is used in an acoustic system. In FIG. 8, a case where the acoustic system has two channels is shown. In the acoustic system shown in FIG. 8, it is assumed that a speaker S1 is present as a sound source on a near end (near-end sound source) and a speaker S2 is present as a sound source on a far end (far-end sound source). On the near end, loudspeakers 10 and 20 for amplifying, in a stereo manner, a sound of a far-end acoustic signal of a voice which the speaker S2 on the far end outputs and microphones 11 and 21 for detecting a near-end acoustic signal of a voice which the speaker S1 on the near end outputs are provided. On the far end, loudspeakers 30 and 40 for amplifying, in a stereo manner, a sound of the near-end acoustic signal and microphones 31 and 41 for detecting the far-end acoustic signal are provided. In the acoustic system shown in FIG. 8, as one example, it is assumed that the multichannel echo canceller 9 is provided only on the near end.
In FIG. 8, the multichannel echo canceller 9 comprises adaptive filters 91 to 94, adders 95 and 97, and subtracters 96 and 98. The adaptive filter 91, based on an output signal from the subtracter 96, estimates a transfer characteristic h11(ω) from the loudspeaker 10 to the microphone 11. The ω is a frequency. The adaptive filter 91 convolves an estimated result eh11(ω) to a loudspeaker input signal sp1 to be inputted to the loudspeaker 10 and outputs a resultant. The adaptive filter 92, based on an output signal from the subtracter 96, estimates a transfer characteristic h21(ω) from the loudspeaker 20 to the microphone 11. The adaptive filter 92 convolves an estimated result eh21(ω) to a loudspeaker input signal sp2 to be inputted to the loudspeaker 20 and outputs a resultant. The adaptive filter 93, based on an output signal from the subtracter 98, estimates a transfer characteristic h12(ω) from the loudspeaker 10 to the microphone 21. The adaptive filter 93 convolves an estimated result eh12(ω) to a loudspeaker input signal sp1 to be inputted to the loudspeaker 10 and outputs a resultant. The adaptive filter 94, based on an output signal from the subtracter 98, estimates a transfer characteristic h22(ω) from the loudspeakers 20 to the microphone 21. The adaptive filter 94 convolves an estimated result eh22(ω) to a loudspeaker input signal sp2 to be inputted to the loudspeaker 20 and outputs a resultant.
The adder 95 receives an output signal from the adaptive filter 91 and an output signal from the adaptive filter 92 and adds these output signals. The subtracter 96 receives a detection signal m1 detected by the microphone 11 and an output signal from the adder 95 and subtracts, from the detection signal m1, the output signal from the adder 95. Thus, an output signal y1 from the subtracter 96 becomes a signal in which a voice of the speaker S2 on the far end, which is an echo, is cancelled. The output signal y1 from the subtracter 96 is transmitted to the far end and amplified by the loudspeaker 30 on the far end. The adder 97 receives an output signal from the adaptive filter 93 and an output signal from the adaptive filter 94 and adds these output signals. The subtracter 98 receives a detection signal m2 detected by the microphone 21 and an output signal from the adder 97 and subtracts, from the detection signal m2, the output signal from the adder 97. Thus, an output signal y2 from the subtracter 98 becomes a signal in which a voice of the speaker S2 on the far end, which is an echo, is cancelled. The output signal y2 from the subtracter 98 is transmitted to the far end and amplified by the loudspeaker 40 on the far end.
Here, when the adaptive filters 91 to 94 estimate the transfer characteristics, a learning identification method (LMS) which is generally used as a learning method for an adaptive filter is utilized. Specifically, the adaptive filters 91 and 92 estimate the transfer characteristics so that a power of the output signal y1 from the subtracter 96 becomes minimum. The adaptive filters 93 and 94 estimate the transfer characteristics so that a power of the output signal y2 from the subtracter 98 becomes minimum.
Hereinunder, problems of the conventional multichannel echo canceller 9 will be described. In FIG. 8, in order to obtain an echo cancellation effect, correct transfer characteristics are required to be estimated in the adaptive filters 91 to 94. For example, in a case of the adaptive filter 91, it is required that the estimated result eh11(ω) corresponds to the transfer characteristic h11(ω). However, in the conventional multichannel echo canceller 9, only in a state where only either one of the loudspeaker input signal sp1 or the loudspeaker input signal sp2 has been amplified, correct transfer characteristics can be estimated. In other words, in the conventional multichannel echo canceller 9, only in a state of monaural reproduction in which only either one of the loudspeaker 10 or the loudspeaker 20 operates, the correct transfer characteristic can be estimated.
When multichannel reproduction is performed (herein, when stereo reproduction is performed), both of the loudspeaker 10 and the loudspeaker 20 usually operate and correlated signals are inputted to the loudspeaker 10 and the loudspeaker 20. For example, in the microphones 31 and 41 on the far end shown in FIG. 8, it is assumed that a voice of the speaker S2 is detected in a stereo manner. Also it is assumed that the voice of the speaker S2 is s2(ω); a transfer characteristic from the speaker S2 to the microphone 31 is a21(ω); and a transfer characteristic from the speaker S2 to the microphones 41 is a22(ω). At this time, the loudspeaker input signal sp1 inputted to the loudspeaker 10 becomes s2(ω)·a21(ω) and the loudspeaker input signal sp2 inputted to the loudspeaker 20 becomes s2(ω)·a22(ω). Because the loudspeaker input signal sp1 and the loudspeaker input signal sp2 both include s2(ω), the loudspeaker input signal sp1 and the loudspeaker input signal sp2 are correlated. In the detection signal m1(ω) detected by the microphone 11, a formula (1) is satisfied.[Formula 1]m1(ω)=s2(ω)a21(ω)h11(ω)+s2(ω)a22(ω)h21(ω)+s1(ω)a11(ω)=s2(ω){a21(ω)h11(ω)+a22(ω)h21(ω)}+s1(ω)a11(ω)  (1)A component s2(ω) represented in the formula (1) is an echo. Therefore, the adaptive filters 91 and 92 are only required to estimate transfer characteristics so that an output signal from the adder 95, which is an echo replica, becomes the same as the component s2(ω) represented in the formula (1). When the output signal from the adder 95 becomes the same as the component s2(ω) represented in the formula (1), the power of the output signal y1 becomes minimum (in other words, only the s1(ω) component remains) and the echo is cancelled.
However, m1(ω) represented in the formula (1) includes a component which is obtained by multiplying s2(ω) by a predetermined transfer characteristic and the loudspeaker input signals sp1 and sp2 also include components which are obtained by multiplying s2(ω) by predetermined transfer characteristics. This means that by using either one of the loudspeaker input signal sp1 or the loudspeaker input signal sp2, the s2(ω) component represented in the formula (1) can be reproduced. Accordingly, for the transfer characteristic eh11(ω) estimated by the adaptive filter 91 and the transfer characteristic eh21(ω) estimated by the adaptive filter 92, a plurality of solutions (for example, a formula (2) or a formula (3)) exist, respectively.[Formula 2]eh11(ω)={a21(ω)h11(ω)+a22(ω)h21(ω)}/a21(ω), eh21(ω)=0  (2)[Formula 3]eh11(ω)=0, eh21(ω)={a21(ω)h11(ω)+a22(ω)h21(ω)}/a22(ω)  (3)
As described above, when the multichannel reproduction is performed, the conventional multichannel echo canceller 9 is not capable of estimating correct transfer characteristics due to inconstant solutions, thereby leading to a problem that an echo cancellation effect cannot be obtained in a stable manner.
Therefore, a technique which selects one channel, for which estimation processing is to be performed, by determining whether a signal level of each channel is high or low (for example, Japanese Patent No. 3407392, etc.) has conventionally been proposed. In addition, a technique which estimates correct transfer characteristics by adding additional signals to the loudspeaker input signal sp1 and the loudspeaker input signal sp2 (for example, Japanese Patent No. 3073976, etc.) has also been proposed. Conventionally, these techniques have been adopted as countermeasures to the inconstant solutions in the conventional multichannel echo canceller 9.
However, in a case where a difference in signal levels between channels is small, the technique disclosed in Japanese Patent No. 3407392 cannot correctly determine whether a signal level of each channel is high or low and cannot estimate correct transfer characteristics. Therefore, the technique disclosed in Japanese Patent No. 3407392 cannot perform the echo cancellation in an invariably stable manner. In addition, the technique disclosed in Japanese Patent No. 3073976 adds the additional signals to the loudspeaker input signal sp1 and the loudspeaker input signal sp2 in order to estimate correct transfer characteristics. Because of this, the additional signals, in addition to a voice of a speaker, are amplified, thereby leading to a problem of deteriorating sound quality due to the additional signals. As described above, the technique disclosed in Japanese Patent No. 3407392 and the technique disclosed in Japanese Patent No. 3073976 which have been proposed as the countermeasures to the inconstant solutions are not capable of performing the echo cancellation in an invariably stable manner and deteriorate the sound quality.