Sound source localization (SSL) is a growing area of technology given the rise of consumer-level multi-microphone arrays. SSL is often utilized to determine from which direction a sound originates by analyzing the sound as detected by multiple microphones having a known geometry. SSL techniques typically output a probability distribution of potential arrival angles over a working angular space or a single estimate of an arrival angle (i.e., SSL angle estimate) with a corresponding confidence metric.
The confidence metric is traditionally derived directly from an algorithm that implements the SSL technique using a steered beamformer (SB) method or a time-difference of arrival (TDOA) method. SB-based methods point a beamformer at different angles to receive the sound and derive confidence based on the energy in the strongest beam (i.e., post-beamformer signal level) compared to a long-run baseline energy or compared to the energy in beams at other angles. TDOA-based methods derive confidence from the correlation between signals that are recorded by the microphones. Traditionally, the SSL angle estimate, weighted by the confidence metric, is used with a time-averaging filter in order to obtain a robust and stable estimate, which may indicate the location of sound source(s) that produce the sound. However, the time-averaging filter may cause the SSL angle estimate to have a relatively long settling time, which is an issue when multiple switching sources exist (e.g., a two-person conversation).
Moreover, although SB-based methods and TDOA-based methods can provide reasonably accurate SSL angle estimates and corresponding confidence metrics in some conditions, such methods may not be adequately robust in other conditions. For instance, during the tail end of a speech fragment, the sound in a reflected path may be more correlated and/or have higher energy than the sound in the direct path, which may lead the SSL technique to output a high confidence in the wrong direction (e.g., angle). Conventional SSL techniques typically rely on a single feature (i.e., beam strength or correlation) to determine confidence metrics, which may negatively impact accuracy of the confidence metrics.