Sound source localization (SSL) using microphone arrays is employed in many important applications such as human-computer interaction and intelligent rooms. A large number of SSL algorithms have been proposed, with varying degrees of accuracy and computational complexity. For example, in broadband acoustic source localization applications such as teleconferencing, a number of SSL techniques are popular. These include steered-beamformer (SB), high-resolution spectral estimation, time delay of arrival (TDOA), and learning based techniques.
In regard to the TDOA approach, most existing algorithms take each pair of audio sensors in the microphone array and compute their cross-correlation function. In order to compensate for reverberation and noise in the environment a weighting function is often employed in front of the correlation. A number of weighting functions have been tried. Among them is the maximum likelihood (ML) weighting function.
However, these existing TDOA algorithms are designed to find the optimal weight for pairs of audio sensors. When more than one pair of sensors exists in the microphone array an assumption is made that pairs of sensors are independent and their likelihood can be multiplied together. This approach is questionable as the sensor pairs are typically not truly independent. Thus, these existing TDOA algorithms do not represent true ML algorithms for microphone arrays having more than one pair of audio sensors.