The present invention relates to a method and apparatus for multi-channel acoustic echo cancellation which cancel a room echo that causes howling and presents a psycho-acoustic problem in a teleconference system provided with a multi-receive-channel system.
With the widespread proliferation of ISDN, LAN and similar digital networks and the developments of high efficiency speech and image coding techniques in recent years, much attention is now focused on a TV conference system which allows participants to communicate with each other while watching the other's face and a multi-media communication system using personal computers or workstations which also enables data communications or the like. These systems adopt in many cases a hands-free communication system using loudspeakers and microphones to thereby create a more realistic teleconferencing environment. However, the hands-free communication system is attended with the problem of echo and howling generation; the acoustic echo canceling technique is indispensable to avoiding the problem. In fact, acoustic echo cancellers are widely used, but they are mostly for one-channel use and can cancel only an acoustic echo from one (one channel) loudspeaker to one (one channel) microphone. At present, stereo is not uncommon in many TV broadcast programs and music media, for instance, and there is also a growing demand for a multi-channel hands-free communication system. To meet this requirement, it is necessary to implement a multi-channel acoustic echo canceller which permits cancellation of acoustic echoes from a plurality of speakers (a plurality of channels) to a microphone; in recent years, technical problems and solutions thereto have been investigated very actively toward the realization of such a multi-channel echo canceller.
A description will be given, which reference to FIG. 1, of a one-channel echo canceller. In hands-free communication, speech uttered by a person at a remote place is provided as a received signal to a received signal terminal 11 and is radiated from a loudspeaker 12. In FIG. 1, k indicates discrete time and x(k) a received signal sample value. In the following description, x(k) will be referred to simply as a received signal. An echo canceller 14 cancels an echo y(k) which is produced when the received signal x(k) radiated from the loudspeaker 12 is picked up by a microphone 16 after propagating over an echo path 15. The echo y(k) can be molded by such a convolution as follows: ##EQU1## where h(k,n) is the impulse response indicating the transfer function of the echo path 15 at time k and L is the number of taps, which is a constant preset corresponding to the reverberation time of the echo path 15. At first, received signals x(k) from the current time to L-1 are stored in a received signal storage and vector generating part 17. The L received signals thus stored are outputted as a received signal vector x(k), that is, as EQU x(k)=[x(k), x(k-1), . . . , x(k-L+1)].sup.T ( 2)
where *.sup.T indicates a transposition. In an estimated echo generating part 18 formed by an FIR filter, the inner product of the received signal vector x(k) of Eq. (2) and an estimated echo path vector h(k), which is a filter coefficient vector provided from an echo path estimating part 19, is calculated as follows: EQU y(k)=h.sup.T (k)x(k) (3)
As a result, an estimated echo or echo replica y(k) is generated. This inner product calculation is equivalent to such a convolution as Eq. (1). In the echo path estimating part 19, the estimated echo path vector h(k) is generated which is used in the estimated echo generating part 18. The most common algorithm which is used for the echo path estimation is an NLMS (Normalized Least Mean Square) algorithm. With the NLMS algorithm, the received signal vector x(k) at time k and a residual echo e(k), i.e. the following error, obtained by subtracting the estimated echo signal y(k) from the output y(k) of the microphone 16 by a subtractor 21, EQU e(k)=y(k)-y(k) (4)
are used to calculate an estimated echo path vector h(k+1) which is used at time k+1, by the following equation: EQU h(k+1)=h(k)+.alpha.e(k)x(k)/{x.sup.T (k)x(k)} (5)
where .alpha. is called a step size parameter, which is used to adjust adaptation within the range of 0&lt;.alpha.&lt;2. By repeating the above processing upon each increment of time k, the estimated echo path vector h(k) in the echo path estimating part 19 gradually converges toward a true echo path vector h(k) whose elements are impulse response sequence h(k, n), (where n=0, 1, 2, . . . , L-1) of the true echo path 15 at time k, that is, toward the following echo path vector: EQU h(k)=[h(k,0), h(k,1), . . . ., h(k,L-1)].sup.T ( 6)
As the result of this, the residual echo signal e(k) given by Eq. (4) can be reduced.
In general, a teleconferencing system of the type having an N (.gtoreq.2) channel loudspeaker system and an M (.gtoreq.1) channel microphone system employs, for echo cancellation, such a configuration as shown in FIG. 2. That is to say, an echo cancellation system 23 is composed of N-channel echo cancellers 22.sub.1, 22.sub.2, . . . , 22.sub.M for processing N-input-one-output time sequence signals, which are each interposed between all of N channels of the receiving side where received signals x.sub.1 (k) to x.sub.N (k) are provided to input terminals 11.sub.1 to 11.sub.N and are radiated from loudspeakers 12.sub.1 to 12.sub.N and one channel of the sending side composed of microphones 16.sub.1 to 16.sub.H. In this instance, the echo cancellation system has a total of N.times.M echo paths 15.sub.nm (1.ltoreq.n.ltoreq.N, 1.ltoreq.m.ltoreq.M). The N-channel echo cancellers 22.sub.1, 22.sub.2, . . . , 22.sub.M, which are each connected between all of the N channels of the receiving side and one channel of the sending side, have such a configuration as shown in FIG. 3, which is an extended version of the configuration of the echo canceller 14 depicted in FIG. 1. This is described in detail, for example, in B. Widow and S. D. Stearns, "Adaptive Signal processing," Prince-Hall, Inc. pp. 198-200 (1985). Now, consider the N-channel echo canceller 22.sub.m connected to the m-th channel (1.ltoreq.m.ltoreq.M) of the sending side. The echo signal y.sub.m (k) that is picked up the m-th channel microphone 16.sub.m is obtained by adding together respective received signals of all channels at the sending side after propagation over respective echo paths 15.sub.1m to 15Nm. Hence, it is necessary to devise how to make the echo path estimation by evaluating only one residual echo e.sub.m (k) in common to all the receiving side channels. At first, for the received signal X.sub.n (k) of each channel, the following received signal vectors are generated in the received signal storage and vector generating parts (17.sub.1, 17.sub.1, . . . , 17.sub.N): ##EQU2## where L.sub.1, L.sub.2, . . . , L.sub.N are numbers of taps, which are constants preset corresponding to reverberation times of the respective echo paths 15.sub.1m, 15.sub.1m, . . . , 15.sub.Nm. The vectors thus generated are combined in a vector combining part 24 as follows: EQU x(k)=[x.sub.1.sup.T (k), x.sub.2.sup.T (k), . . . , x.sub.N.sup.T (k)].sup.T ( 8)
Also in the echo path estimating part 19.sub.m, estimated echo path vectors h.sub.1m (k), h.sub.2m (k), . . . , h.sub.Nm (k), which are used to simulate N echo paths between the respective receiving side channels and the m-th sending side channel, are combined as follows: EQU h.sub.m (k)=[h.sub.1m.sup.T (k), h.sub.2m.sup.T (k), . . . , h.sub.Nm.sup.T (k)].sup.T ( 9)
In the case of using the NLMS algorithm, the updating of the combined estimated echo path vector h.sub.m (k) is done as follows: EQU h.sub.m (k+1)=h.sub.m (k)+.alpha.e.sub.m (k)x(k)/{x.sup.T (k)x(k)} (10)
In the estimated echo generating part 18.sub.m, an estimated echo y.sub.m (k) for the echo y.sub.m (k) picked up in the m-th sending channel is generated by the following inner product calculation: EQU y.sub.m (k)=h.sub.m.sup.T (k)x(k) (11)
By combining vectors in the respective channels into one vector, the flow of basic processing becomes the same as in the one-channel echo canceller of FIG. 1.
Of the defects of the conventional echo cancellation system for application to the teleconferencing system composed of an N-channel speaker system and an M-channel microphone system, the defect that the present invention is to solve will be described in connection with a two-channel stereo teleconferencing system. Referring now to FIG. 4, speaker/microphone systems at points A and B which are connected via a network NW respectively comprise two speakers 12a.sub.1, 12a.sub.2 and 12b.sub.1, 12b.sub.2 and two microphones 16a.sub.1, 16a.sub.2 and 16b.sub.1, 16b.sub.2. Between two receiving channels and each sending channel there are provided echo cancellers 22a.sub.i, 22a.sub.2 and 22b.sub.1, 22b.sub.2.
In the case of applying the conventional echo canceller to the stereo teleconferencing system which sends and receives signals between the points A and B over two channels, there is presented a problem that each time a speaker at the point A moves or changes to another, person who becomes a speaker an echo from the point B by the speech at the point A increases even if the echo paths 15.sub.11 and 15.sub.21 remain unchanged. The reason for this is that the echo path impulse responses are not correctly estimated in the echo cancellation system at the Point B side.
To explain this problem, attention is paid to the operation of the echo canceller 22b.sub.1 connected to a first sending channel of the echo cancellation system at the point B. Let two-channel received signal vectors be represented as x.sub.1 (k) and x.sub.2 (k). Letting echo path vectors of true echo paths 15.sub.11 and 15.sub.21 of the respective receiving channels be represented as h.sub.11 (k) and h.sub.21 (k), respectively, an echo y.sub.1 (k) that is picked up by the microphone 16.sub.b1 is given by EQU y.sub.1 (k)=h.sub.11.sup.T (k)x.sub.1 (k)+h.sub.21.sup.T (k)x.sub.2 (k) (12)
On the other hand, an estimated echo y.sub.1 (k) that is generated in the echo canceller is expressed by the following equation using estimated echo path vectors h.sub.11 (k) and h.sub.21 (k) that are generated in the echo canceller: EQU y.sub.1 (k)=h.sub.11.sup.T (k)x.sub.1 (k)+h.sub.21.sup.T (k)x.sub.2 (k) (13)
When one speaker speaks or utters speech at the point A, the received signal vectors x.sub.1 (k) and x.sub.2 (k) have a very strong cross-correlation. When the received signal vectors x.sub.1 (k) and x.sub.2 (k) have a constant high cross-correlation, the combined vector {h.sub.11.sup.T (k), h.sub.21.sup.T (k)} as the solution of the following equation (14) exists infinitely, forming a subspace H.sub.x inherent in the cross-correlation between the received signal vectors x.sub.1 (K) and x.sub.2 (k). EQU y.sub.1 (k)=y.sub.1 (k) (14)
On this account, in the case of using an ordinary iterative error minimization algorithm such as the NLMS algorithm, the combined vector {h.sub.11.sup.T (k), h.sub.21.sup.T (K)} converges to a point in subspace H.sub.x nearest the initial point; in general, it does not converge to the true value {h.sub.11.sup.T (k), h.sub.21.sup.T (k)}.
For simplicity, consider the case where the received signal vectors x.sub.1 (k) and x.sub.1 (k) are expressed by constant scalar values p.sub.1 and p.sub.2 and the source signal vector s(k) as follows: EQU x.sub.i (k)=p.sub.1 s(k), x.sub.2 (k)=p.sub.2 s(k) (15)
The subspace H.sub.x where [h.sub.11.sup.T (k), h.sub.21.sup.T (k)] is allowed to exist can be regarded as a straight line Al in FIG. 5A which satisfies the following equation: EQU p.sub.1 h.sub.11 (k)+p.sub.2 h.sub.21 (k)=p.sub.2 h.sub.21 (k) (16)
When the adaptation starts from the initial value (0, 0), the steady-state solution [h.sub.11p.sup.T (k), h.sub.21p.sup.T (k)] is obtained as follows: EQU h.sub.11p (k)={h.sub.11 (k)+h.sub.21 (k)p.sub.2 /p.sub.1 }p.sub.1.sup.2 /(p.sub.1.sup.2 +p.sub.2.sup.2).noteq.h.sub.11 (k) (17) EQU h.sub.21p (k)={h.sub.11 (k)p.sub.1 /p.sub.2 +h.sub.21 (k)}p.sub.2.sup.2 /(p.sub.1.sup.2 +p.sub.2.sup.2).noteq.h.sub.21 (k) (18)
Hence, Eq. (16) is no longer satisfied when the rate between the scalar values p.sub.1 and p.sub.2 varies, with the result that no echo can be canceled and the echo increases accordingly.
As above, in the case of applying the conventional echo canceller to the teleconferencing system composed of an N-channel loudspeaker system and an M-channel microphone system, if received signals of the respective channels have a cross-correlation, the echo path impulse responses cannot correctly be estimated--this poses the problem that the echo increase whenever the cross-correlation between the received signal varies.