When using a pair or an array of microphones for sound source localization different cues can be used for the localization. Some common cues include: (a) Time Delay of Arrival (TDOA), or Interaural Time Difference (ITD), the difference in travelling time for the sound waves to the different microphones; (b) Interaural Intensity Difference (IID), the difference in intensity of the sound signal in the different microphones due to barriers in the sound path causing an attenuation (e.g. an artificial head); and (c) Interaural Envelope Difference (IED), the difference in travelling time as measured in the envelope signal of unresolved harmonics.
These above mentioned cues are severely degraded by room reflections. In this case, the sound signal is present as a direct path and additionally indirect paths caused by reflections. These reflections reach the microphones after the direct path due to the longer path the reflections have to cross. As a consequence of these reflections, measurements of the source location are severely impaired. In the case of temporal cues (e.g. ITD), instead of the time delay for the real signal the delay between reverberations and the signal are measured. In the case of intensity based cues the reverberations add to the intensity of the direct signal.
In common approaches, the temporal cues are determined by a cross correlation between the microphone signals. In this case, one approach consists in the evaluation of the cues at all time instances and later to select the measurements based on a reliability criterion, e.g., the ratio between the main peak in the correlation to the additional peaks. Other approaches include the weighting of the correlation function.
An approach more motivated by psycho-acoustical findings is the evaluation of the cues solely in the onsets of the signal and suppressing all following measurements for a fixed span of time. The justification for this approach can be found in the aforementioned fact that the direct path, without reverberations, impinges on the microphones before the reverberations. Hence the early part of the signal, directly after the onset, contains no echoes. This method can as well by applied to the intensity based cues with the same rationale. The aforementioned measures can be applied either on the full-band signal or in sub-bands.