Videoconferencing systems are used to allow real-time visual and voice communication between participants. For purpose of discussion, the different ends of a videoconference are referred to as near-end and far-end. The near-end is a local frame of reference, and the far-end is a remote frame of reference. Typically the near-end and the far-end have respective video and audio equipment through which near-end and far-end participants communicate. Some videoconferencing devices are able to automatically detect who is actively speaking, locally, by analyzing captured video and audio data. Detecting the active speaker can enable a number of features such as automatic panning and zooming (either physically or virtually), displaying information to help a viewer identify the active speaker, transcribing information about who said what during a videoconference, and others.
While an active speaker can be detected using only analysis of video data, active speaker detection can be improved by also using audio data. A videoconferencing device may be provided with a microphone array, and time-delay analysis can be used to calculate likely directions from which sound arrived at the microphone array (called sound source localization). However, videoconferencing devices also have one or more loudspeakers for playing sound received from the far-end. While the incoming far-end sound signal can be used to detect and cancel some of the far-end sound captured by the near-end microphone array, this echo cancellation is imperfect and the audio data captured by the near-end microphone may include significant levels of sound from the far-end (as played on the near-end loudspeakers). This leakage can cause a number of problems observed only by the present inventors. For example, it can make the sound source localization return false positives, which can cause automatic panning and zooming to pan/zoom to an inactive speaker or worse. The sound source localization may become unavailable. The leakage of course can also create audible echo at the far-end.
Techniques discussed below relate to dealing with far-end sound in teleconferencing devices.