Meetings between multiple individuals conducted in two or more locations that are remote to each other can be facilitated using audio or video conferencing systems. These systems typically include some number of microphones, at least one loudspeaker, a camera, audio and video signal processing, and means for connecting the system to a public or private network. In such a system, microphones can operate to pick up near end (N.E.) acoustic audio signals (speech) from an audio source, and digitally process the audio signals in a number of ways before sending them over the network to a far end (F.E.) communication device (i.e., another conferencing system or communication device) to be played by a loudspeaker. Among other things, the digital processing functionality can operate to remove acoustic echo present in a N.E. audio signal prior to its being transmitted to a F.E. system.
Audio and video conferencing systems can have an array comprised of multiple microphones in fixed positions with respect to the environment in which they operate, or the microphones can be mobile, wireless devices that are carried or worn by individuals participating in a conferencing session. In the case that a conferencing system has an array of multiple microphone in fixed positions, it is possible to determine a direction of an audio source with respect to the microphones, and then form and steer a microphone beam towards the direction of the audio source. This technique is known as spatial filtering, and implementing this technique in a conferencing system has the effect of capturing more of an audio signal directly from an audio signal source and less of the audio signal that is reflected from the surfaces of a room that the system is operating in. A beam is typically characterized by a beam width that can be expressed in degrees, and each beam can be the same width or each beam can be a different width. In the case of fixed beamforming, each beam is oriented in a direction with respect to the microphone array in order to receive audio signals from a particular direction. Spatial filtering is typically implemented in system with a fixed array of microphone in order to achieve a higher quality signal to acoustic echo cancellation functionality, which ultimately results in a higher quality audio signal being sent to a far-end location.
FIG. 1 is a diagram showing functional elements comprising a typical audio conference system 100. The system 100 is shown to have four microphones 115A to 115D, one loudspeaker 110, a beamforming function 120, and separate acoustic echo cancellation functionality, AEC 130A to 130n, for each one of a beam generated by the beamformer 120. Acoustic echo cancellation is an essential function performed by audio or video conferencing systems, and it generally operates to remove acoustic echo from a near-end audio signal prior to the signal being transmitted to a remote location. Specifically, acoustic echo occurs when a far-end audio signal, received and played by a N.E. system loud speaker, is picked up by a microphone proximate to the loud speaker. An audio signal captured by the local microphone will include at least some of the far-end audio signal information, and this audio information can be transmitted back to the far-end system where it can be heard as an echo. This acoustic echo is distracting and can severely degrade the quality of an audio conferencing session if it is not cancelled.
The conferencing system illustrated in FIG. 1 is configured such that the beamforming function 120 receives audio signal information directly from each of the four microphones, uses this information to determine a direction of an audio source, and then selects and directs one of three (in this case) microphone beams in the direction of the audio source. The audio signal information captured by the selected microphone beam is then processed by one of the three (in this case) acoustic echo cancellers, AEC 130A to 130n, and the resulting echo cancelled audio signal can be sent to a far-end device to be played. The advantage in performing the beamforming prior to echo cancellation is that the audio signal being processed by the echo cancellers is of higher quality (i.e., higher signal to noise ratio), and the number of separate echo cancellers can be limited to the number of beams, regardless of the number of microphones in an array.
While performing the beamforming operation first limits the number of echo cancellation functions to the number of beams, the beamforming function is influenced by acoustic signals received directly from any loudspeakers that are proximate to microphones in an array. In order to limit the influence of these speaker signals, the acoustic echo cancellation operation can be performed prior to the beamforming operation. Such a configuration is illustrated in FIG. 2.