1. Technical Field
The invention is related to estimating the time delay of arrival (TDOA) between a pair of audio sensors of a microphone array, and more particularly to a system and process for estimating the TDOA using a generalized cross-correlation (GCC) technique that employs provisions making it more robust to correlated ambient noise and reverberation noise.
2. Background Art
Using microphone arrays to locate a sound source has been an active research topic since the early 1990's [2]. It has many important applications including video conferencing [1, 5, 10], video surveillance, and speech recognition [8]. In general, there are three categories of techniques for sound source localization (SSL), i.e. steered-beamformer based, high-resolution spectral estimation based, and time delay of arrival (TDOA) based [2].
The steered-beamformer-based technique steers the array to various locations and searches for a peak in output power. This technique can be tracked back to early 1970s. The two major shortcomings of this technique are that it can easily become stuck in a local maxima and it exhibits a high computational cost. The high-resolution spectral-estimation-based technique representing the second category uses a spatial-spectral correlation matrix derived from the signals received at the microphone array sensors. Specifically, it is designed for far-field plane waves projecting onto a linear array. In addition, it is more suited for narrowband signals, because while it can be extended to wide band signals such as human speech, the amount of computation required increases significantly. The third category involving the aforementioned TDOA-based SSL technique is somewhat different from the first two since the measure in question is not the acoustic data received by the microphone array sensors, but rather the time delays between each sensor. So far, the most studied and widely used technique is the TDOA based approach. Various TDOA algorithms have been developed at Brown University [2], PictureTel Corporation [10], Rutgers University [6], University of Maryland [12], USC [3], UCSD [4], and UIUC [8]. This is by no means a complete list. Instead, it is used to illustrate how much effort researchers have put into this problem.
While researchers are making good progress on various aspects of TDOA, there is still no good solution in real-life environment where two destructive noise sources exist—namely, spatially correlated noise (e.g., computer fans) and room reverberation. With a few exceptions, most of the existing algorithms either assume uncorrelated noise or ignore room reverberation. It has been found that testing on data with uncorrelated noise and no reverberation will almost always give perfect results. But the algorithm will not work well in real-world situations. Thus, there needs to be a more vigorous exploration of the various noise removal techniques to handle the spatially correlated noise issue for real-world situations, along with different weighting functions to deal with the room reverberation issue. This is the focus of the present invention. It is noted, however, that the present invention is directed at providing more accurate “single-frame” estimates. Multiple-frame techniques, e.g., temporal filtering [11], are outside the scope of this invention, but can always be used to further improve the “single-frame” results. On the other hand, better single frame estimates should also improve algorithms based on multiple frames.
It is further noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of references including the publications corresponding to each designator can be found at the end of the Detailed Description section.