1. Field of the Invention
The present invention is directed to audio conferencing systems, and more particularly to a method of reducing the training time of an acoustic echo canceller in a full duplex audio conferencing system using beamforming.
2. Description of the Related Art
Spatial directivity is highly desirable for sound pickup in audio conferencing systems for the purpose of attenuating room noises, interference, and reverberations, thereby enhancing the quality of near-end speech. Spatial directivity can be achieved either by utilizing directional microphones or through a combination of beamformer and a plurality of omnidirectional microphones arranged as a microphone array. The latter approach is preferable as it provides greater directivity, flexibility and cost efficiency compared to the use of directional microphones.
Echo effects represent a well known problem in hands-free audio conferencing systems. Undesirable echo effects result from the loudspeaker signal being picked up by the microphone(s) and then transmitted back to the far-end party. The typical industry requirement for echo attenuation is on the order of 40 dB. In the case of a desktop phone, the proximity of the loudspeaker to the microphones, combined with the high loudspeaker volume and required transmit gain, makes the echo problem particularly difficult to deal with. Although beamforming can contribute to the suppression of the loudspeaker echo signal due to its inherent spatial directivity, a practical fixed or adaptive beamformer cannot satisfy this requirement alone. Therefore in practice, conference or speakerphone design requires the use of a traditional Acoustic Echo Cancellation (AEC) in combination with beamforming to achieve high-quality full-duplex operation.
Several prior art references discuss the combination of acoustic echo cancellation with beamforming (see M. Branstein and D. Ward, “Microphone Arrays. Signal Processing Techniques and Applications”. Springer Verlag, 2001, and H. Buchner, W. Herbordt, W. Kellermann, “An Efficient Combination of Multi-Channel Acoustic Echo Cancellation With a Beamforming Microphone Array”, Proc. Int. Workshop on Hands-Free Speech Communication (HSC), pp. 55-58, Kyoto, Japan, April, 2001). In one approach, acoustic echo cancellation is performed on all the microphone signals in parallel, which is computationally intensive. A second approach is to perform acoustic echo cancellation on the spatially filtered signal at the output of the beamformer. The challenge in the latter case results from the fact that the transfer function between the loudspeaker and the spatially filtered signal is time varying as the beamformer changes its look direction. Indeed, each beamformer presents its own set of characteristics that depend on the spatial area it covers, such as the direct path, reflections, background noise and local interference signals. Therefore the AEC has to deal with changes in the echo path each time the beamformer changes its look direction. This can result in a significant degradation of the full-duplex performance.
One method of dealing with the problem of transitioning from sector-to-sector is presented in U.S. patent application Ser. No. 10/306,154, filed Nov. 29, 2002 (Franck Beaucoup and Michael Tetelbaum), entitled “A method of acoustic echo cancellation in full-duplex hands free audio conferencing with spatial directivity”. This invention addresses the problem of multiple look directions by storing and retrieving the unique echo canceller information for each sector from dedicated workspaces. This method facilitates echo cancellation once the AEC has already converged (i.e. when the far-end speech has exercised and trained the AEC to the echo path of a particular direction), prior to switching look directions. However, this approach does not address the problem of requiring initial convergence on each sector. For example, when a call is first set up and bi-directional conversation begins, the beamformer will point to a particular spatial sector in response to the first active near-end talker, thereby allowing adaptation of the echo canceller for this particular sector during segments of far-end speech. However, if the talker changes position to an “unused” sector (or a new talker becomes active), then the echo canceller must re-converge on the new sector. This means that all filter coefficients are initially zero for the new sector, resulting in undesirable echo effects because the AEC remains in a “non-converged” state. Until an acceptable level of echo canceller convergence is obtained, the system may be unstable, resulting in echo and howling effects.
Although some measures can be adopted to prevent these effects (for instance, some amount of loss can be applied to reduce the level of the feedback signal), such measures typically degrade the full-duplex performance of the system. Therefore it is an object of an aspect of the invention to reduce the AEC training time as much as possible.
The prior art does not appear to set forth any methods dealing specifically with initial convergence of an acoustic echo canceller in conferencing systems having more than one possible look direction (and correspondingly multiple echo paths). There are, however, several well-known methods of reducing start-up echo and howling effects for a single echo path. These methods are based on various schemes of applying switched loss on the loudspeaker and/or microphone signals until the echo canceller adapts sufficiently to ensure a reasonable level of echo cancellation. For example, see U.S. Pat. No. 4,560,840 entitled Digital Handsfree Telephone, by Hansen Bjorn, assigned to International Standard Electric Corp. However, in general these methods result in a degradation of the subjective quality of the system. It is not known in the art to apply these techniques to an AEC in the case of multiple echo paths, due to the problem of minimizing the total time of convergence on all echo paths, so that the degradation in quality remains minimal.
Another group of prior art methods is based on training the system AEC prior to its use for the first call. These methods make use of training signals played through the loudspeaker at system start-up (i.e. the first time the speakerphone is powered up). One example of such a prior art approach is set forth in U.S. Pat. No. 5,428,604, Training Method for an Echo Canceller for Use in a Voice Conference System, assigned to NEC Corporation. A drawback of this approach is that it requires playback of a loud training signal through the speaker for a time duration that is sufficient to achieve an acceptable level of convergence in the acoustic echo canceller. This training sound may be annoying for the user, especially where the AEC has to be trained for multiple echo paths thereby giving rise to a long duration of the training sound.