The present invention relates to a multi-channel acoustic echo cancellation method and apparatus which cancel acoustic echoes that cause howling and give rise to psychoacoustical problems in a teleconference system provided with a multi-receive-channel system, and a recording medium that has recorded thereon a program for implementing the multi-channel acoustic echo cancellation.
In recent years, various forms of telecommunication have emerged with the widespread proliferation of digital networks, such as ISDN, LAN and the Internet, and with the development of high efficiency speech and image coding techniques. In a TV conference or desktop teleconference in which each participant can talk while looking at the other party through utilization of a wide screen television, or a personal computer or workstation placed at the participant""s seat, a hands-free telecommunication system is often employed which allows two or more persons to participate in the conversation with ease and provides a more realistic teleconferencing environment. However, this system that uses loudspeakers and microphones inevitably suffers from echoes and howling. To overcome this problem, acoustic echo canceller techniques are indispensable.
In such situations as mentioned above, acoustic echo cancellers are now in wide use, but they are mostly for one-channel audio use and cancel an acoustic echo over only one channel from one loudspeaker to one microphone. On the other hand, stereo is common in many TV broadcast programs and music media, and there is also a strong demand for the realization of a multi-channel hands-free telecommunication system. To meet this requirement, it is necessary to implement a multi-channel acoustic echo canceller which permits cancellation of acoustic echoes from two or more loudspeakers (channels) to a microphone. In recent years, technical problems and solutions thereto have been investigated very actively toward the realization of such a multi-channel echo canceller.
Conventionally, such a configuration as depicted in FIG. 1 is used to cancel acoustic echoes in a teleconference system which is composed of a receive system of N (Nxe2x89xa72) channels and a send system of M (Mxe2x89xa72) channels. That is, N-channel echo cancellers 221, 222, . . . , 22M, which constitute an echo cancellation part 22, are connected between received signal terminals 111, 112, . . . , 11N of all the N receive channels and each of the M send channels, respectively. Received signals from the received signal terminals 111, 112, . . . , 11N of the respective receive channels are applied to loudspeakers 121, 122, . . . , 12N, from which they are radiated as acoustic signals. The illustrated echo canceller system cancels acoustic echoes which are produced when the acoustic signals are picked up by microphones 161, 162, . . . , 16M after propagating over echo paths 15nm (where 1xe2x89xa6nxe2x89xa6N and 1xe2x89xa6mxe2x89xa6M).
The N-channel echo cancellers 221, 222, . . . , 22M have the same configuration, which is such as depicted in FIG. 2. This configuration is described as being applied to a two-channel system in B. Widow and S. D. Stearns, xe2x80x9cAdaptive signal processing,xe2x80x9d Prince-Hall, Inc., pp.198-200 (1985). In the configuration of FIG. 2, received signals x1(k), x2(k), . . . , xN(k) are input into adaptive filters 2211, 2212, . . . , 221N each of which form N estimated echo paths. The outputs from the adaptive filters 2211, 2212, . . . , 221N are added together by an adder 222, by which an echo replica yxe2x80x2m(k) is generated. The difference between the echo replica yxe2x80x2m(k) and the picked-up output signal (echo) ym(k) from the microphone 16m is detected by a subtractor 223. An error signal (a residual echo) em(k) provided from the subtractor 223 is fed back to the adaptive filters 2211 through 221N. The error signal and the received signals x1(k) to xN(k) are used to determine filter coefficient vectors, for example, by an NLMS algorithm, and the adaptive filters 2211 to 221N are controlled adaptively.
Incidentally, though not shown in FIG. 1, the echo cancellers 221 through 22M aim to prevent that when an acoustic signal zm(k) originally intended to send is input into the microphones 161 to 16M and sent out through the echo cancellers 221 to 22M, a reproduced sound from each loudspeaker, picked up by each microphone, is sent out as an echo ym(k) together with the signal zm(k). In other words, the error signal em(k) provided as the result of echo cancellation contains the signal zm(k) that ought to be sent. However, the present invention is directed toward the cancellation of an echo signal ym(k) which is produced when the acoustic signals radiated from the loudspeakers are picked up by the microphone 16m; hence, no particular mention will be made herein to the signal zm(k) that ought to be sent.
When cross-correlation among the received signals x1(k) to xN(k) is low, the adaptive filters 2211, 2212, . . . , 221N estimate the corresponding echo paths with relatively high accuracy, thus producing echo replicas that accurately simulate the acoustic echoes to be cancelled. In actual teleconferences, however, speech of one speaker is sent over multiple channels from the far end in many cases, and the received signals are so highly cross-correlated that the convergence speeds and accuracies of the adaptive filters are both degraded, often resulting in failure to provide intended echo cancellation capabilities. As a solution to this problem, there is proposed in U.S. Pat. No. 5,661,813 a scheme which reduces or changes the cross-correlation of the received signals by pre-processing them prior to their input into the N-channel echo cancellers 221, 222, . . . , 22M.
The configuration disclosed in the above U.S. patent is such as depicted in FIG. 3, in which a pre-processing part 30 equipped with the above-mentioned function is placed between the received signal terminals 111 to 11N and the loudspeakers 121 to 12N and the N-channel echo cancellers 221 to 22M. In FIG. 4 there is shown an example of the configuration of the pre-processing part 30. The received signals from the received signal terminals 111 to 11N and additive signals, generated in additive signal generating parts 3011, 3012, . . . , 301N, are added by adders 3021, 3022, . . . , 302N, from which processed signals x1xe2x80x2(k), x2xe2x80x2(k), . . . , xNxe2x80x2(k) are provided, respectively. In the generation of the additive signals, received signal information may be used or may not be used. By increasing the magnitudes of the additive signals, the convergence characteristics of the adaptive filters 2211, 2212, . . . , 221N can be improved. A similar scheme is disclosed in U.S. Pat. No. 5,828,756. Many of pre-processing systems already proposed, for example, in U.S. Pat. No. 5,661,813 and J. Benesty, D. R. Morgan, and M. M. Sondi, xe2x80x9cA Better Understanding and an Improved Solution to the Problems of Stereophonic Acoustic Echo Cancellation,xe2x80x9d Proc. ICASSP97, vol. 1, pp.303-306 (1997), can be implemented by mathematical modifications with the configuration depicted in FIG. 4. For example, even in a pre-processing part which pre-processes each of N-channel signals xi(k) (where i=1, 2, . . . , N) at a discrete time k by using a processing function fi (where i=1, 2, . . . , N) and outputs a processed signal xixe2x80x2(k) (where i=1, 2, . . . , N) in the following form:
xixe2x80x2(k)=fi[xi(k)]xe2x80x83xe2x80x83(1)
Eq. (1) can be modified as follows:
xixe2x80x2(k)=xi(k)+(fi[xi(k)]xe2x88x92xi(k))xe2x80x83xe2x80x83(2)
Therefore, the additive signal fi[xi(k)]xe2x88x92xi(k) can be regarded as a signal obtained by pre-processing the original signal xi(k).
With a view to improving the convergence characteristics of the adaptive filters in the N-channel echo cancellers 221, 222, . . . , 22M, there has been proposed the scheme that pre-processes the received signals as described above with respect to FIGS. 3 and 4; in practice, however, since the pre-processed signals are output from the loudspeakers 121, 122, . . . , 12N, the additive signal needs to have its magnitude suppressed within a range over which the additive signal will not make any psychoacoustical difference between it and the original signal. This limits the improvement in the convergence characteristics of the adaptive filters 221 to 22N and consequently in the echo cancellation performances.
While in the above the prior art has been described as being applied to the acoustic echo cancellation in the multi-channel teleconference system, the principle of acoustic echo cancellation is to cancel the actual echo ym(k) by simulating the echo path from the loudspeaker to the microphone (that is, by estimating the impulse response of the echo path) through the use of the echo canceller as shown in FIG. 1. This echo cancellation technique is also applicable to the case of picking up an acoustic signal from a desired sound source by a microphone and removing background sound radiated from a loudspeaker, for example, in a hall, theater, dome, or similar building provided with a public address system. Accordingly, the received signal referred to in the following description may be a signal from whatever signal source, as long as it is an electric signal that is provided from a loudspeaker to a playback channel.
It is an object of the present invention to provide a novel multi-channel acoustic echo cancellation method and apparatus which significantly improve the echo cancellation performance capabilities even when an additive signal by pre-processing is small, and a recording medium having recorded thereon a program for implementing the acoustic echo cancellation.
According to an aspect of the present invention, there is provided a multi-channel acoustic echo cancellation method for an acoustic system which has N receive channels each containing a loudspeaker for generating an acoustic signal from a received signal, N being an integer equal to or greater than 2, and at least one pick-up channel containing a microphone for picking up the acoustic signal, the N loudspeakers and the microphone being placed in a common sound field, the method comprising the steps of:
(a) generating additive signals for the received signals input in the N receive channels, respectively;
(b) adding the received signals of the N receive channels and the additive signals corresponding thereto, to generate processed signals in the N receive channels;
(c) radiating the processed signals in the N receive channels by the loudspeakers of the channels corresponding thereto;
(d) picking up, by microphone of the at least one pick-up channel, a combined acoustic echoe of the reproduced sounds sneaking thereinto from the loudspeakers in the N receive channels, and inputting the combined acoustic echo into the at least one pick-up channel as an acoustic echo signal; and
(e) individually processing the N received signals and the N additive signals to generate an echo replica that simulates the acoustic echo signal in the pick-up channel, and subtracting the echo replica from the acoustic echo signal to thereby perform acoustic echo cancellation.
According to another aspect of the present invention, there is provided a multi-channel echo canceller for an acoustic system which has N receive channels each containing a loudspeaker for generating an acoustic signal from a received signal, N being an integer equal to or greater than 2, and at least one pick-up channel containing a microphone for picking up the acoustic signal, the N loudspeakers and the microphone being placed in a common sound field, the echo canceller comprising:
N additive signal generating means for generating additive signals for the received signals input in the N receive channels, respectively;
N processed signal generating means for adding the received signals of the N receive channels and the additive signals corresponding thereto to generate processed signals in the N receive channels;
the N loudspeakers provided in the N receive channels, for radiating the processed signals in the N receive channels;
the microphone for picking up a combined acoustic echo of the reproduced sounds echoes sneaking thereinto from the loudspeakers in the N receive channels, and for inputting into the at least one pick-up channel the combined acoustic echo as an acoustic echo signal;
means for individually processing the N received signals and the N additive signals to generate an echo replica that simulates the acoustic echo signal in the pick-up channel, and for subtracting the echo replica from the acoustic echo signal to thereby perform acoustic echo cancellation.
According to still another aspect of the present invention, there is provided a recording medium on which there is recorded, as a computer program, a procedure for carrying out the multi-channel acoustic echo cancellation method.