1. Field of the Invention
The present invention generally relates to systems that perform acoustic beamforming based on audio input received via an array of microphones.
2. Background
As used herein, the term acoustic beamforming, or simply beamforming, refers to a method for spatially filtering sound waves received by an array of microphones via processing of the audio signals produced by the array. Beamforming may be used to generate an audio signal in which components attributable to sound waves arriving at the array from a particular direction or directions are attenuated relative to components attributable to sound waves arriving from another direction or direction(s). If the position of a desired audio source (e.g., a talker) relative to the microphone array is known and/or the position of an undesired audio source (e.g., a source of noise or interference) relative to the microphone array is known, then beamforming can advantageously be used to attenuate the undesired audio source relative to the desired audio source. Logic that performs beamforming may be referred to as a beamformer.
Beamformers operate by selectively weighting audio signals produced by the microphone array such that the level of the response of the array is dependent upon the sound wave direction of arrival. The relationship between the sound wave direction of arrival and the response level of the microphone array is often graphically represented as a “beam pattern.” A beam pattern may have one or more lobes, or areas of relatively strong response, as well as one or more nulls, or areas of relatively weak response. The lobe providing the maximum level of response is often referred to as the main lobe. A main lobe of a beam pattern may be referred to simply as a “beam.” The direction in which a beam is pointed may be referred to as the “look direction” of the beam.
A beamformer may utilize a fixed or adaptive beamforming algorithm to produce a particular beam pattern. In fixed beamforming, the weights applied to the audio signals generated by the microphone array are pre-computed and held fixed during deployment. The weights are independent of observed target and/or interference signals and depend only on an assumed source and/or interference location. In contrast, in adaptive beamforming, the weights applied to the audio signals generated by the microphone array may be modified during deployment based on observed signals to take into account a changing source and/or interference location. Adaptive beamforming may be used, for example, to steer spatial nulls in the direction of discrete interference sources. An audio source localization technique may be used to estimate the current source and/or interference location.
Beamforming may be used in a variety of applications. For example, beamforming may be used in speakerphones, audio teleconferencing and audio/video teleconferencing systems to direct a beam in the direction of a near-end talker, thereby improving the quality of a near-end speech signal obtained for transmission to a far-end listener. However, there are various issues associated with speakerphones and teleconferencing systems that use beamforming that can lead to distortion of the near-end speech signal. One issue arises when the near-end talker is outside of the “normal” spatial range to which beams are directed. To address this issue, the normal spatial range covered by the beams may be expanded. However, this comes at the cost of high computational complexity. Another possible way to address this issue is to allow a user to manually disable the beamforming functionality and revert to the use of a primary microphone. This approach is disadvantageous in that it requires manual intervention by the user and also requires a far-end listener to provide feedback regarding the quality of the transmitted speech signal.
Another issue that can lead to distortion of the near-end speech signal is that a talker localization algorithm used to identify an optimal look direction for acoustic beamforming may select the wrong look direction. For example, the talker localization algorithm may select the wrong look direction because it is operating in a highly reverberant environment with strong reflections. A further issue that can lead to the distortion of the near-end speech signal is the placement of a speakerphone/teleconferencing system in an environment that deviates from the assumed acoustic model used to design the beamformer.
Still another issue that can lead to the distortion of the near-end speech signal is that there may be a gain and/or phase mismatch between two or more microphones in the microphone array used to perform beamforming. Factory calibration may be performed to address this issue. However, this may be expensive and doesn't address environmental damage or gradual drift. On-the-fly auto-calibration features may be built into the speakerphone/teleconferencing system. However, such features are difficult to use without precise knowledge of the spatial properties of the calibration signal and/or the acoustic environment.
When beamforming is working effectively, it can significantly increase the quality of the near-end speech signal by attenuating undesired audio sources as described above. However, as also described above, when beamforming is not working effectively, the near-end speech signal may be distorted, thereby impairing the ability of the far-end listener to perceive and/or understand the signal. What is needed, then, is a system and method for handling variations in the level of performance of a beamformer in a manner that addresses one or more of the aforementioned shortcomings associated with prior art solutions.