A conferencing endpoint relies on the known response of electro-acoustic transducers (speakers and microphones) to provide an effective communication experience. However, manufacturing tolerances and degradation over time mean that response of individual transducers can vary significantly. For example, it is typical to see ±5 dB variation across the pass band of a typical electret microphone used in a conference phone. The response of these devices is particularly important when the endpoint is attempting to conduct spatial capture of the audio through the use of multiple microphones. In this case, assumptions are made by the signal processing algorithms about the relationships between the transfer functions of the transducers, e.g., based on rough estimations of the transfer function based on known locations and/or orientations of the microphones together with a sound propagation model. If these assumptions do not hold, the algorithmic performance is compromised.
It is therefore desirable to perform calibration and tuning to ensure optimal performance of the device. In a conferencing endpoint, multiple directional microphones may be calibrated using the device speakers to generate a test stimulus. This approach does not presuppose uniform speaker-to-microphone coupling, which is typically influenced by the spatial arrangement of the transducers or by speaker and output gain stage tolerances.
In traditional methods for such a calibration, the device speaker is driven with a test stimulus and the corresponding input recorded at the microphones. The impulse response for each microphone may then be determined by deconvolving the input signal with the output signal. In other cases, the response may be determined as the difference between a captured and an expected spectral response for a known stimulus. An appropriate equalization filter is then derived based on the measured impulse response or captured spectra and a correction filter used to modify the response towards a target. The impulse response is typically dominated, at least initially, by the direct path between the speaker and microphone.
A problem associated with this approach is the necessity to make assumptions about microphone directionality, speaker-to-microphone coupling, speaker consistency and knowledge of the actual room acoustics or suitable testing room (e.g. semi-anechoic) before deriving the equalization filter. An arrangement depicted in FIG. 1 illustrates a case where a measuring system includes three directional microphones 111, 112, 113, each having cardioid spatial sensitivity as depicted and all three being in a substantially constant spatial relationship during operation. The impulse response of each microphone is likely to be dominated by the direct path (solid line) from each speaker 121, 122. Accordingly, the first and third microphones 111, 113 are likely to receive similar levels from the first speaker 121, while the second microphone 112 will have significantly higher level as it is closer to the speaker 121 and has higher spatial sensitivity in the direction of the speaker 121. These characteristics are an important part of the design of the microphone/speaker arrangement shown in FIG. 1.
If the three microphones were equalized uniformly based on this measurement, the described differences in response would be removed, which would corrupt the performance of this microphone array. In traditional methods, therefore, it would be required to estimate the impact of these factors (microphone/speaker distance, spatial response) and account for them in the equalization process with the aim of preserving the spatial capture of the arrangement.
All the figures are schematic and generally only show parts which are necessary in order to elucidate the invention, whereas other parts may be omitted or merely suggested. Unless otherwise indicated, like reference numerals refer to like parts in different figures.