Multi-party speaker localization is a very active research area, playing a key role in many applications involving distant-speech recognition, scene analysis, hands-free videoconferencing, gaming and surveillance. Despite the vast efforts devoted to the issues that arise in real-world applications, the development of systems to localize the speaker in acoustic clutter of unknown competing sound sources yet remains a demanding challenge.
Various methods and apparatuses for the determination of the location of a plurality of speech sources are known in the art.
For example, one method known in the art, that aims at the determination of the location of a plurality of speech sources, proposes an approximation framework for distributed target localization in sensor networks. In accordance with the method, the unknown target positions are represented on a location grid as a sparse vector, whose support encodes the multiple target locations. The location vector is linearly related to multiple sensor measurements through a sensing matrix, which can be locally estimated at each sensor. The multiple target locations are determined by using linear dimensionality-reducing projections of sensor measurements. The overall communication bandwidth requirement per sensor is logarithmic in the number of grid points and linear in the number of targets, ameliorating the communication requirements.
In accordance with another reference know in the art, a multiple target localization approach is proposed by exploiting the compressive sensing theory, which indicates that sparse or compressible signals can be recovered from far fewer samples than that needed by the Nyquist sampling theorem. In accordance with the method, multiple target locations are formulated as a sparse matrix in the discrete spatial domain. The proposed algorithm uses the received signal strengths (RSSs) to find the location of targets. Instead of recording all RSSs over the spatial grid to construct a radio map from targets, far fewer numbers of RSS measurements are collected, and a data pre-processing procedure is introduced. Then, the target locations can be recovered from these noisy measurements, only through an l1-minimization program. The proposed approach reduces the number of measurements in a logarithmic sense, while achieves a high level of localization accuracy.
In accordance with yet another method known in the previous art on source localization is realized using resampling within a sparse representation framework. In particular, the amplitude and phase information of the sparse solution are considered holistically to estimate the direction-of-arrival (DOA), where a resampling technique is developed to determine which information will give a more precise estimation.
The earlier methods rely on complex mathematical models that place a large computational burden on the systems implementing them, and appear to operate best at identifying what is the direction of the sound signal of the source instead of what is the actual location of the source.
Therefore, what are needed are methods, apparatuses and computer program products capable of accurately provide the location of the sound source, even in complex environments where competing speech sources are present.