1. Technical Field
The invention is related to object localization and tracking within a prescribed search area, and in particular, to a system and process for improving the precision of localization estimates generated by use of a receiving array, such as, for example, microphone arrays, directional antenna arrays, radar receiver arrays, etc., by providing cluster-based statistical post-processing of initial localization measurements or estimates.
2. Background Art
Localization and tracking of objects within prescribed regions is an important element of many systems. For example, a number of conventional audio conferencing applications use microphone arrays with conventional sound source localization (SSL) processing (i.e., time-delay estimates, beamsteering, etc.) to enable the speech or sound of particular individuals to be effectively isolated and processed as desired. Similar techniques have used arrays of directional antennas for locating radio sources for a number of applications, such as, for example, for determining which node or nodes are to be used by particular subscribers within in a wireless computer network. Still other similar techniques have been used for tracking objects using radar or laser receiver arrays. In general, such techniques are well known to those skilled in the art.
For example, conventional microphone arrays typically include an arrangement of microphones in some predetermined layout. These microphones are generally used to capture sound waves from various directions and originating from different points in space. One of a number of conventional techniques is then used to perform SSL. In general, these SSL techniques fall into two categories, including those based on time delay estimates (TDE), and those based on beamsteering. Finding the direction to a sound source plays an important role in spatial filtering, i.e. pointing a beam to the sound source and suppressing any noises coming from other directions. In some cases the direction to the sound source is used for speaker tracking and post-processing of recorded audio signals. In the context of a video conferencing system, speaker tracking is often used for dynamically directing a video camera toward the person speaking.
In general, most sound source localization systems process the signals from a microphone array by first preprocessing each signal from each microphone of the array. This preprocessing typically includes packaging the signal in frames, performing noise suppression, and classifying individual frames for determining whether particular frames will be processed or rejected for the purposes of determining the location of the sound source.
Once the preprocessing is complete, the actual sound source localization typically involves the use of conventional SSL techniques including, for example, TDE or beamsteering techniques, to provide either initial direction estimates or a probability distribution function (PDF) for indicating where a sound source is located. This location can be defined in terms of one-dimensional localization (i.e., the angle along which the sound source is located in a plane), two-dimensional localization (i.e., two angles, direction and elevation, for defining a vector representing the direction of the sound source in a three dimensional space), and full three-dimensional localization (i.e., direction, elevation and distance, for locating a point in a three-dimensional space at which the sound source is located). In general, whichever SSL technique is used, the goal is typically to provide robustness to reverberation, the ability to distinguish multiple sound sources, and high location precision in potentially noisy environments.
Once an indicator of the sound source location has been computed, a post-processing phase is often implemented. In general, this post-processing combines the results of several localization measurements to increase the precision, to follow the sound source movements, or to track multiple sound sources. Various conventional techniques used for SSL post-processing include simple averaging, statistical processing, Kalman filtering, particle filtering, etc. Such techniques are typically application dependent, but are generally directed at removing localizations from reverberated waves and strong reflections, and to improve sound source localization precision. In general, as the precision of the localization estimates or measurements increases, any further processing of the audio signal (such as, for example, accurate sound source tracking) is enhanced.
Signal source or object localization with respect to other signal types, including radio signals, radar waves, etc., is often accomplished using pre-and post-processing techniques similar to those described above for the case of sound waves captured via a microphone array. In general, such localization techniques often include beamsteering techniques adapted for different signal and receiver array types (e.g., directional antenna arrays, radar or laser receiver arrays, etc.). As with audio signals, localization of other signal types is typically based on analysis of the propagation of signals (e.g., sound waves, radio waves, radar wave reflections, etc.).
With all such localization systems, regardless of signal or array type, one primary goal is to provide fast and precise localization estimates or measurements, even in the presence of noise and other effects, such as diffraction, interference, reflection, etc., which tend to decrease localization precision and reliability.
As noted above, post-processing of localization estimates is generally designed to increase the precision of localization estimates. Therefore, what is needed is a system and process for providing fast and reliable post-processing of localization data for improving the precision of localization estimates. Further, such a system and method should be operable with and adaptable for existing localization techniques.