1. Technical Field
The invention is related to finding the direction to a sound source from a microphone array in a prescribed search area using a beamsteering approach, and more particularly to such a system and process that provides improved beamsteering with less drain on system resources while providing accurate, real time results.
2. Background Art
Localization of sound sources plays important role in many audio systems having microphone arrays. Different techniques have been developed to perform this sound source localization (SSL). In general, these techniques fall into two categories—namely those based on time delay estimates (TDE), and those based on beamsteering. Finding the direction to a sound source plays an important role in doing spatial filtering, i.e. pointing a beam to the sound source and suppressing the noises coming from other directions. In some cases the direction to the sound source is used for speaker tracking and post processing of recorded audio signals. In the context of a videoconferencing system, speaker tracking is often used to direct a video camera toward the person speaking.
In general, a majority of sound source localization systems process the signals from the microphone array as follows. First, each signal from each microphone of the array is pre-processed. This includes packaging the signal in frames, performing noise suppression and performing a classification that decides whether a frame will be processed or rejected for the purposes of determining the location of a sound source. In addition, a frame may be converted into the frequency domain depending on the type of analysis that is to be performed. Once the preprocessing is complete, the actual sound source localization typically involves using one of the aforementioned techniques—namely time delay estimation or beamsteering. This stage ends with direction estimation or the generation of a probability distribution function (PDF), each of which indicated where a sound source is located. This location can be defined in terms of one angle (localization in one dimension), two angles (direction and elevation—localization in 2D) or a full 3D localization (i.e., direction, elevation and distance). The major problems the various existing SSL approaches try to solve are robustness to reverberation, the ability to distinguish multiple sound sources, and high precision in an noisy environment. Once an indicator of the sound source location has been computed, a post-processing phase can be implemented. Essentially, in post processing, the results of several localization measurements are combined to increase the precision, to follow the sound source movements, or to track multiple sound sources. Techniques used for this vary from simple averaging to more complicated statistical processing, Kalman filtering, particle filtering [2], and the like.
In regard to the group of SSL processes based on TDE techniques, the processing generally involves analyzing the signals coming from pairs of microphones in the array. One M element microphone array can have up to M(M-1)/2 different pairs. The processes usually find the direction to the sound source in two phases. During the first phase, the delays are calculated for each microphone pair based on correlation function estimation, with modifications for better robustness to reverberated waves and noise. In the second phase, all time delay estimates are combined to compute the final direction to the sound source. Besides increasing the precision and robustness to reverberation and noises, the second phase has to resolve a degree of ambiguity introduced by the TDE method itself. More particularly, for each microphone pair, there are many directions with the same time delay in the working volume (i.e., it is a hyperbolic surface). To overcome this major disadvantage, the microphone arrays associated with sound source localizers based on TDE are positioned in certain ways that result in the work volume being in just one half of the space. Another disadvantage of this group of methods is that the amount of necessary calculations increases with the square of the number of microphones in the array.
The beamsteering approach, on the other hand, is based on well known techniques used to capture sound with microphone arrays—namely beamforming. This is the ability to make the microphone array “listen” to a given direction and to suppress the sounds coming from other directions. Processes for sound source localization with beamsteering form a searching beam and scan the work space by moving the direction the searching beam points to. The energy of the signal, coming from each direction, is calculated. The decision as to what direction the sound source resides is based on the maximal energy. This approach leads to finding extremum of a surface in the coordinate system direction, elevation, and energy. In most of the cases this surface is multimodal, i.e. it has multiple extremums due to multiple sound sources and reverberated waves. Additional difficulties are caused by the shape of the searching beam. For different frequencies, the easiest beamforming process, i.e., the delay and sum technique, introduces so-called side lobes. These are directions with increased sensitivity. Also critical for quick localization is the searching procedure. Examples of existing search procedures are the coarse-to-fine search described in reference [1] and the tracking of the sound source using particle filters technique described in reference [2]. The main advantages of the beamsteering approach to SSL are that it does not introduce ambiguity as is the case with the TDE approach. In addition, beamsteering uses the signals from all microphones for energy estimation for each direction, which leads to better robustness to noise and reverberations.
However, SSL computations based on beamsteering are generally considered slower, less precise, but more robust to reverberation and noise than SSL computations based on TDE. The present invention resolves the shortcomings of the beamsteering approach to provide accurate and real time SSL computations, while still retaining the robustness such techniques are known for.
It is noted that in the preceding paragraphs the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of references including the publications corresponding to each designator can be found at the end of the Detailed Description section.