Systems exist that detect the direction of sound sources (sound source localization or SSL). Until recently, these systems were very specific and of large size due to expansive and huge hardware. As a consequence, they were only dedicated to military applications (sonars).
Thanks to the development of modern computation power and hardware, SSL is now an emerging technology with a growing number of more general civilian applications, such as surveillance, video-conferencing, mobile phones, video games, robotics, etc.
As shown in FIG. 1, the physical principle of SSL is based on arrays of more than two microphones, which enable to obtain signals depending on the sources direction.
Early detection techniques were based on simple methods like triangulation. However, they were only usable with a single sound source, with large microphones arrays, and at high frequencies (only applicable to sonars).
More recent applications use much more sophisticated signal processing methods that are able to localize the source direction by using complex algorithms based on the differences of delay and amplitude between the microphone signals shown in FIG. 1. A source direction (also known as direction of arrival, DOA, of sound) from an array of microphones is generally expressed using two angles (θ; φ) of an angular space, where θ is the azimuth, i.e. the angle with the frontal plane, and φ is the elevation angle.
In particular, SSL algorithms rely on two main kinds of SSL methods: beamforming method requiring large microphones arrays and angular-spectra method complying with the use of small-sized microphones arrays.
Well-known angular-spectra methods are MUSIC, SRP-PHAT and MVDR.
Generally, such an angular-spectra method consists in generating an angular spectrum m=F(θ, φ) from audio signals acquired from the array of microphones. In such an angular spectrum, each point (θ, φ) corresponding to a direction of the space is associated with a respective magnitude m calculated by the method, as shown in FIG. 2.
Local maxima LM (i.e. points whose magnitudes are local maxima) are then identified in the angular spectrum and form a set of possible direction solutions. Part of the possible direction solutions are artifacts or phantom sources (i.e. false solutions).
To remove the artifacts, the possible direction solutions LM are ranked based on their corresponding magnitude m. The best ranked maximum or maxima BM are then identified as true solutions for the real sources RS, as shown in the Figure.
However, in case of noisy environment, of reverberation or of multiple sources, current angular-spectra methods reveal to lack accuracy and/or robustness.
US Patent Application 2010/0217590 is known that discloses a system and method for performing speaker localization. The system uses several SSL algorithms to achieve a better separation between a source and noises of a noisy environment, in case of devices having small microphone arrays, for example mobile phones.
In the method of US 2010/0217590, the algorithm that best suits the environmental conditions is selected and the sound direction solution corresponding to the selected algorithm is finally output as the direction of the source.
This method has several drawbacks.
First, although robustness is slightly improved, accuracy may be not satisfying in some situations where the selected SSL algorithm provides bad localization.
Second, the method is restricted to the localization of a single sound source and thus is inappropriate to the case of multiple sources.