Localization of audio sources is required in many applications, such as teleconferencing, where the audio source position is used to steer a high quality microphone towards the talker. In video conferencing systems, the audio source position may additionally be used to steer a video camera towards the talker.
It is known in the art to use electronically steerable arrays of microphones in combination with location estimator algorithms to pinpoint the location of a talker in a room. In this regard, high quality and complex beamformers have been used to measure the power at different positions. In such systems, location estimator algorithms locate the dominant audio source using power information received from the beamformers. The foregoing prior art methodologies are described in Speaker localization using a steered Filter and sum Beamformer, N. Strobel, T. Meier, R. Rabenstein, presented at the Erlangen work shop 99, vision, modeling and visualization, Nov. 17–19th, 1999, Erlangen, Germany.
U.K. Patent Application No. 0016142 filed on Jun. 30, 2000 for an invention entitled “Method and Apparatus For Locating A Talker” discloses a talker localization system that includes an energy based direction of arrival (DOA) estimator. The DOA estimator estimates the audio source location based on the direction of maximum energy at the output of the beamformer over a specific time window. The estimates are filtered, analyzed and then combined with a voice activity detector to render a final position estimate of the audio source location.
In highly reverberant environments, reflected acoustic signals can result in miscalculation of the direction of arrival of the audio signals generated by the talker. This is due to the fact that the energy of the audio signals picked up by the beamformer can be stronger in the direction of the reverberation signals than for the direct path audio signals. The effects of reverberation have most impact on audio source localization at the beginning and the end of a speech burst. Miscalculation of the direction of arrival of the audio signals at the beginning of a speech burst can be caused by a strong reverberation signal having a short delay path. As a result, the direct path audio signal may not have dominant energy for a long enough period of time before being masked by the reverberation signal. In this situation, the DOA estimator can miss the beginning of the speech burst and lock on to the reverberation signal. Miscalculation of the direction of arrival of the audio signals at the end of a speech burst can caused by a reverberation signal that masks the decaying tail of the direct path audio signal resulting in beam steering in the wrong direction until the next speech burst occurs.
In an attempt to deal with the effects of reverberation during talker localization, two approaches have been considered. One approach uses a priori knowledge of the room geometry and the reverberation (interference) and noise sources therein. Different space regions within the room are pre-classified as containing a reverberation or noise source. The response of the beamformer is then minimized at locations corresponding to the locations of the pre-classified reverberation and noise sources.
The second approach uses a computationally complex Crosspower Spectrum Phase (CPS) analysis to calculate Time Delay Estimates (TDE) between the microphones of the microphone array. Unfortunately, it is known that performance of TDE methods degrade dramatically in the highly reverberant conditions.
As will be appreciated, the above-described approaches to deal with the effects of reverberation suffer disadvantages. Accordingly, a need exists for an improved method for talker localization in a reverberant environment. It is therefore an object of the present invention to provide a novel method and system for talker localization in a reverberant environment.