1. Technical Field
This invention relates to a method for multiple channel acoustic echo cancellation (AEC), applicable to systems that derive a multi-channel spatialised signal from a monophonic signal, each channel of which is applied to a respective member of an array of loudspeakers at differing gains to give the percept or audible illusion of directionality. This class of spatialised signal will be termed here as steered mono. A steered mono system uses two or more gain elements to represent the spatialisation, which is mapped to a panning processor to generate corresponding loudspeaker outputs. In the embodiments to be described, a two-channel stereophonic signal is used, with two loudspeakers—a system known as “stereo from steered mono” (SSM), but the principles of the invention can be applied to systems with more than two channels. The invention has application in teleconferencing systems where each talker's voice is artificially given spatial positioning for the benefit of the listener.
2. Related Art
For comfortable speech communication in a teleconference system that uses a loudspeaker and microphone, as opposed to a headset, a method of acoustic echo cancellation (AEC) is required. For monophonic systems the topology shown in FIG. 1 can be used with a number of different adaptive processes such as least mean square (LMS), recursive least squares (RLS) or fast affine projection (FAP). However, for stereophonic and multiple channel systems, existing solutions are far less advanced with some major obstacles yet to be overcome. The example in FIG. 2 shows that for a stereophonic system there are two echo paths, h1 and h2 (which include the microphone and loudspeaker impulse responses), compared to the single path in the monophonic case. (This assumes a single microphone is used, which is generally the case when spatialisation is to be created artificially. More generally, the number of echo paths is the product of the number of loudspeakers with the number of microphones).
Existing solutions to the stereo acoustic echo cancellation problem generally assume the system arrangement shown in FIG. 2 where the talker-to-microphone path responses are unknown. The aim of the adaptive process in the echo canceller is to use the signals x1(t),x2(t) and e(t) to train the adaptive filters ĥ1 and ĥ2 such thate(t)→0  (1)With existing adaptive filter processes it is not possible to achieve a convergent set of filters such thath1=ĥ1 and h2=ĥ2  (2)Instead, a convergent solution such as the following is obtainedh1*g1+h2*g2=ĥ1*g1+ĥ2*g2  (3)where * is the convolution operator. Note that Equation (3) satisfies Equation (1), but that Equation (2) is not a unique solution for Equation (3), so the values for h1 and h2 cannot be derived from this result.
If the filters g1 or g2 change, possibly due to the talker moving, the equality in Equation (3) no longer holds (unless Equation (2) is also met). Thus, the echo canceller no longer produces a convergent solution and the echo heard by the talker rises in level.
Various solutions to this problem have been proposed that either manipulate the loudspeaker signals, x1 and x2, or use the properties of the signals x1 and x2. The aim of these solutions is to make use of the cross-correlation properties of the two signals as it can be shown that a solution to Equation (2) exists when the two signals are sufficiently decorrelated. However, as the signals x1 and x2 are inherently highly correlated in a teleconferencing system, techniques that exploit the small decorrelated features in the signals have poor performance in anything but ideal conditions.
It has been proposed to add a small amount of independent white noise to the signals x1 and x2. It is shown that this significantly aids the convergence of the solution to that in Equation (2) by introducing some signal de-correlation. However, although adding noise in this manner does improve the convergence, the noise has to be added at such a level that it is undesirably audible.