Advanced processing of audio signals has become increasingly important in many areas including e.g. telecommunication, content distribution etc. For example, in some applications, such as hands-free communication and voice control systems, complex processing of inputs from a plurality of microphones has been used to provide a configurable directional sensitivity for a microphone array comprising the microphones. As another example, a tele-conferencing application may use audio beam steering to select and isolate speakers. Specifically, the processing of signals from a microphone array can generate an audio beam with a direction that can be changed simply by changing the characteristics of the combination of the individual microphone signals.
An increasingly important function in advanced audio processing applications is the estimation of a position of various sound sources. Indeed, as audio processing is becoming used in increasingly complex audio environments, it is often desirable to be able to estimate directions of two simultaneous sound sources. For example, in a tele-conferencing scenario, two speakers may be simultaneously active. Such direction estimates may for example be used to direct audio beams in the desired directions or to provide notches in directions corresponding to interfering sound sources. In some scenarios sound source separation may be important and may be based on estimated directions of the two sound sources.
However, it is typically substantially more difficult to estimate directions for two simultaneous sound sources than estimating a direction for a single dominant sound source. A critical problem in such applications is that of how to separate the contributions from the different sound sources in the different microphone signals. Conventional solutions tend to be based on differentiating between the signals based on differences in time or frequency characteristics of the two signals. For example, if it is known that one of the two sound sources will be dominant in certain time intervals, the direction estimate for this sound source may only be generated during such time intervals. Another approach is to exploit frequency differences between the two sound sources. For example, a Fast Fourier Transform (FFT) may be applied to the signals and it may be assumed that one of the sound sources will be dominant in each subband. Accordingly, a single direction estimate may be generated for each subband and the direction estimates may be generated by averaging the subbands belonging to each sound source.
However, such approaches tend to be suboptimal or unreliable in many scenarios. In particular, the approaches rely on the two sound source audio signals having significant temporal or frequency differences and therefore tend to break down for signals that have similar characteristics. Even for relatively different audio signals, a significant degradation may occur as it may be difficult to determine which audio signal is dominant in each frequency and/or time interval. For example, even for different audio signals, the assumption of one sound source being dominant in each subband may only be appropriate for a low proportion of the subbands. Furthermore, conventional sound source localization approaches tend to be complex and resource demanding.
Hence, an improved approach for audio source localization would be advantageous and in particular an approach allowing improved accuracy, reduced sensitivity to similar characteristics of audio signals, increased flexibility, facilitated implementation, reduced resource consumption, improved performance for different operating scenarios and/or improved performance would be advantageous.