The invention concerns a technological solution targeted for systems including audio communication and/or recording functionality, such as, but not limited to, video conference systems, conference phones, speakerphones, infotainment systems, and audio recording devices, for controlling the combination of two or more microphone signals into a single output signal.
The main problems in this type of setup is microphones picking up (in addition to the speech) background noise and reverberation, reducing the audio quality in terms of both speech intelligibility and listener comfort. Reverberation consists of multiple reflected sound waves with different delays. Background noise sources could be e.g. computer fans or ventilation. Further, the signal-to-noise ratio (SNR), i.e. ratio between the speech and noise (background noise and reverberation), is likely to be different for each microphone as the microphones are likely to be at different locations, e.g. within a conference room. The invention is intended to adaptively combine the microphone signals in such a way that the perceived audio quality is improved.
To reduce background noise and reverberation in setups with multiple microphones, beamforming-based approaches have been suggested; see e.g. M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications. Springer, 2001. However, as beamforming is non-trivial in practice and generally requires significant computational complexity and/or specific spatial microphone configurations, microphone combining (or switching/selection) has been used extensively in practice, see e.g. P. Chu and W. Barton, “Microphone system for teleconferencing system,” U.S. Pat. No. 5,787,183, Jul. 28, 1998, D. Bowen and J. G. Ciurpita, “Microphone selection process for use in a multiple microphone voice actuated switching system,” U.S. Pat. No. 5,625,697, Apr. 29, 1997 and B. Lee and J. J. F. Lynch, “Voice-actuated switching system,” U.S. Pat. No. 4,449,238, May 15, 1984. In the microphone selection/combining approach, the idea is to use the signal from the microphone(s) which is located closest to the current speaker, i.e. the microphone(s) signal with the highest signal-to-noise ratio (SNR), at each time instant as output from the device.
Known microphone selection/combination methods are based on measuring the microphone energy and selecting the microphone which has largest input energy at each time instant, or the microphone which experiences a significant increase in energy first. The drawback of this approach is that in highly reverberative or noisy environments, the interference of the reverberation or noise can cause a non optimal microphone to be selected, resulting in degradation of audio quality. There is thus a need for alternative solutions for controlling the microphone selection/combination.