Estimating the bearing of a source is a problem for applications such as passive sonar (in underwater antenna alignment) and in video conferencing (speaker location). The use of sensor arrays is well known in the art for addressing this problem. In one approach, the array information is used to synthesize a beam that is aimed at various points in a domain of interest. An estimate of the source location is derived from the point of maximum received energy.
N. Strobel, S. Spors & R Rabenstein “Joint Audio-Video Signal Processing for Object Localization and Tracking”, Chapter 10 in Microphone Arrays by Branstein & Ward, Springer, 2001, discusses a number of sophisticated methods for sound source location according to the prior art.
Guner Arslan and F. Ayhan Sakarya, “A Unified Neural Networks Based Speaker Localization Technique”, IEEE Transaction on Neural Networks, Vol. 11 no 4 July 2000, sets forth a neural network to generate a source location estimate based on a maximum energy search.
Kerri Harmonic, Joseph Tabriakan and Jeffrey L. Krolik: “Relationships Between Adaptive Minimum Variance Beamforming and Optimal Source Localization”, IEEE Transaction on Signal Processing, Vol. 48, no 1 January 2000, discloses adaptive minimum variance beam forming for source localization.
Zakavauskas (U.S. Pat. No. 5,526,433) teaches the use of a plurality of microphones to aim a highly directional microphone based on differences between the signals from the microphones emanating from the selected source. As with the approaches set forth above, source location is based on a maximum energy search.
Beaucoup and Tetelbaum (U.S. Patent Application 2003/0051532A1) also discloses a talker localization system that is based on maximum energy. To deal with reverberation, an energy history is retained to determine when a new signal starts thereby permitting reliable detection of the direct path.
There are two principal disadvantages of prior art sound localization systems, as set forth above. Firstly it is very difficult to create a main beam with a very narrow beam angle, giving rise to expensive solutions in order to achieve high resolution. Secondly, such highly directional beam patterns are generally accompanied by significant side lobes, which can lead to erroneous look directions. Stated otherwise, a simple beamformer (in free field or in an obstacle) is characterized by multiple side lobes and a limited beam width depending on the geometry of the array and the number of microphones. A source locator based on such a beamformer cannot efficiently discriminate side reflections.
Baker (U.S. Pat. No. 5,686,957) sets forth a multi-microphone system for detecting the source based on the loudest microphone output. For an effective system, each participant requires a microphone. Because the audio detection mechanism is limited in its ability to detect sound, a special camera is used to enhance the peripheral portion of the field of view.
Another prior art approach to estimating source location involves measuring the time delay of arrival (TDOA) between sensor pairs. The major difficulties in these systems are that there is a need for very accurate knowledge of the position of the sensors, and the sensors must be spaced apart by a significant distance to obtain a reasonable time delay from one sensor to another.
Yiteng Huang, Jacob Benesty, G. Elko and Russell M. Mersereau: “Real-time Passive Source Localisation: A Practical Linear-Correction Least-Squares Approach”, IEEE Transactions on Speech and Audio Processing, Vol. 9, no 8, November 2001, provides a good survey of the state of the art in TDOA-based source localization. Huang et al also disclose a system used for speech that is fairly robust against errors, but is large in that it spans 0.8×0.8 m.
Chu (U.S. Pat. No. 5,778,082) determines the location of a speaker by a cross-correlation method, which improves the source detection in a reverberant environment.
Branstein (U.S. Pat. No. 5,581,620) teaches a method for location of a speaker by using of the phase alignment of the input signals. However, this approach requires intensive computation and a large array.
In both Chu and Branstein, the strongest signal is typically sought out. In instances where there are strong echoes it is possible to make an erroneous detection. The beam width possible with a limited number of transducers is restricted. Conventionally, the directivity of an array is proportional to the number of elements in the array. This problem is somewhat overcome with TDOA systems but in order to obtain a good estimation, the sensors must be spatially separated so that a significant portion of the wavelength separates them. These devices are also critically affected by phase and amplitude errors.