Localization of sources is required in many applications, such as teleconferencing, where the source position is used to steer a high quality microphone beam toward the talker. In video conferencing systems, the source position may additionally be used to focus a camera on the talker.
It is known in the art to use electronically steerable arrays of sensors in combination with location estimator algorithms to pinpoint the location of a talker in a room. In this regard, high quality and complex beamformers have been used to measure the power at different positions. Estimator algorithms locate the dominant audio source using power information received from the beamformers. Attempts have been made at improving the performance of prior art beamformers by enhancing acoustical audibility using filtering, etc. The foregoing prior art methodologies are described in Speaker localization using a steered Filter and sum Beamformer, N. Strobel, T. Meier, R. Rabenstein, presented at the Erlangen work shop 99, vision, modeling and visualization, Nov. 17-19th, 1999, Erlangen, Germany.
Localization of acoustic sources is fraught with practical difficulties. Firstly, reflecting walls (or other objects) generate virtual acoustic images of the source, which can be misidentified as real sources by the location estimator algorithm. Secondly, most of the known locator estimator algorithms are unable to distinguish between noise sources and talkers, especially in the presence of correlated noise and during speech pauses. Voice activity detectors have been used to freeze the localization during speech pauses, thereby minimizing the occurrence of incorrect talker localization as a result of echoes or noise.