The localization of one or more speakers (communication parties) is of importance in the context of many different electronically mediated communication situations where multiple microphones, e.g., microphone arrays or distributed microphones are utilized. For example, the intelligibility of speech signals that represent utterances of users of hands free sets and are transmitted to a remote party heavily depends on an accurate localization of the speaker. If accurate localization of a near end speaker fails, the transmitted speech signal exhibits a low signal-to-noise ratio (SNR) and may even be dominated by some undesired perturbation caused by some noise source located in the vicinity of the speaker or in the same room in which the speaker uses the hands-free set.
Audio and video conferences represent other examples in which accurate localization of the speaker(s) is mandatory for a successful communication between near and remote parties. The quality of sound captured by an audio conferencing system, i.e. the ability to pick up voices and other relevant audio signals with great clarity while eliminating irrelevant background noise (e.g. air conditioning system or localized perturbation sources) can be improved by a directionality of the voice pick up means.
In the context of speech recognition and speech control the localization of a speaker is of importance in order to provide the speech recognition means with speech signals exhibiting a high signal-to-noise ratio, since otherwise the recognition results are not sufficiently reliable.
Acoustic localization of a speaker is usually based on the detection of transit time differences of sound waves representing the speaker's utterances by means of multiple (at least two) microphones. However, in the art methods for the localization of a speaker are error-prone in acoustic rooms that exhibit a significant reverberation and, in particular, in the context of communication systems providing audio output by some loudspeakers. In order to avoid erroneous speaker localization due to acoustic loudspeaker outputs echo compensation filtering means are usually employed in order to pre-process the microphone signals used for the speaker localization.
Echo compensation by filtering means allow for the reduction of echo components, in particular, due to loudspeaker outputs, by estimating echo components of the impulse response and adapting filter coefficients in order to suppress the echo components. However, echo suppression by multi-channel echo compensating filters and, particularly, the control of the adaptation of the respective filter coefficients demands for relatively powerful computer resources and results in heavy processor load. Moreover, inefficient echo compensating still results in erroneous speaker localization. Therefore, there is a need for a method for a more reliable localization of a speaker without the demand for powerful computer resources.