The present invention relates adaptive beamformers, and more particularly to the generation of calibration signals for using an adaptive beamformer in an acoustic echo canceler.
Adaptive beamformers are used in a number of disciplines, such as in antennas and in acoustics. A common use of beamformers in these various disciplines is for forming some sort of spatial beam towards a target that represents the wanted signal. Another common use of beamformers is to form the opposite of a beam, namely a notch, in the direction of an unwanted signal, referred to herein as a xe2x80x9cjammer.xe2x80x9d These two functions are not mutually exclusive; beamformers can be designed to form both a beam and a notch simultaneously.
One particular application for a beamformer is in a hands-free communication environment, in which an external loudspeaker and microphone replace the built-in earphone and microphone of a typical telephone handset. Conventional speaker phones as well as hands-free mobile telephones are both examples. Hands-free mobile telephones are often employed in an automotive environment because a driver""s safety can be improved by permitting him to leave his hands free for controlling the automobile instead of the telephone.
One problem with a hands-free telephone set is that the sound emitted by the loudspeaker is picked up by the microphone, causing it to be heard as an echo by the user on the other end of the connection. This echo is, at the very least, annoying, and when very prominent, can be so distracting as to prevent a normal conversation from taking place. Therefore, it is highly desirable to provide a mechanism for suppressing this acoustic echo.
It is known to use an adaptive beamforming arrangement to suppress an acoustic echo. One known technique, which has been described with reference to a car cabin environment, utilizes a plurality of microphones. The essential idea is to use the beamformer to eliminate sounds emanating from the direction of the loudspeaker, while emphasizing sounds that come from the direction of the human voice. Before the beamformer can operate effectively, it must be calibrated, which is a two-step process in the prior art.
The prior art two-step calibration process will now be described with reference to FIGS. 1, 2 and 3. In an exemplary embodiment, first and second microphones 101, 103, as well as a hands-free loudspeaker 105 are arranged in an environment, such as a car cabin. For the sake of simplicity, only two microphones are illustrated and discussed here. However, the techniques can readily be applied to accommodate more than two microphones. Because of their physical proximity, the first and second microphones 101, 103 pick up sounds 107 that emanate from the loudspeaker 105. Therefore, the loudspeaker 105 is considered the jammer source in this application. Referring first to FIG. 1, the first step of the prior art calibration process includes exciting the jammer source (i.e., the hands-free loudspeaker 105) to generate sounds 107. This excitation can be derived from a pseudo noise (PN) signal or a voice signal. These sounds 107 are picked up by each of the first and second microphones 101, 103, which each generate signals that are sampled and stored by respective first and second jammer memories 109, 111. The two stored signals, then, represent the unwanted jammer signal received from each of the respective first and second microphones 101, 103.
Referring now to FIG. 2, the hardware involved in the second step of the prior art calibration process is shown. The first microphone 101 is connected to supply its signal to a first input of a first adder 113. The first jammer memory 109 supplies its output to a second input of the first adder 113, and the resultant output of the first adder 113 is supplied to one input of the beamformer 117. Similarly, the second microphone 103 is connected to supply its signal to a first input of a second adder 115. The second jammer memory 111 supplies its output to a second input of the second adder 115, and the resultant output of the second adder 115 is supplied to a second input of the beamformer 117.
In the second step of the prior art calibration process, the loudspeaker 105 is kept silent. Instead, the target source 114 (e.g., the person doing the talking, such as the driver of the automobile) is activated (e.g., the person begins talking). This enables a xe2x80x9ccleanxe2x80x9d voice signal to be provided to a negating input of the adder 119. The stored jammer signals from the first and second jammer memories 109, 111 are combined with respective signals from the first and second microphones 101, 103, and it is these combined signals that are supplied to the beamformer 117. During this step, the beamformer 117 is adapted so as to minimize the difference between the output of the beamformer 117 and the wanted signal (i.e., the signal that comes from the microphone 101). The result of this is that the target-to-jammer ratio is maximized (i.e., the jammer signal is minimized while the target signal is maximized). Essentially, a spatial notch is formed in the direction of the jammer, and a spatial beam is formed in the direction of the target. It is noted that the arrangement in FIG. 2 depicts the signal from the first microphone 101 being supplied to the negating input of the adder 119. However, this could instead have been the signal from the second microphone 103. The selection should be made on the basis of which microphone is closest to the target source 114.
After the two calibration steps have been performed, the arrangement, as illustrated in FIG. 3, is ready to use.
The prior art configuration as described above has several problems. One is an implementation problem associated with the fact that the jammer memories 109, 111 need to be rather large in order to have enough statistical information available to describe the spatial properties of the jammer location to the adaptation arrangement. The necessary sample length is typically around one second per microphone, which corresponds to several kilobytes of expensive RAM memory per microphone. One reason why this is an important issue derives from the fact that the jammer memories 109, 111 are only used during the calibration process. This means that expensive hardware must be installed that will never be used during the normal operational use of the acoustic echo canceler.
Another problem with the prior art configuration relates to interference susceptibility during recording. More specifically, the prior art solution relies on the jammer 107 being the only source during the jammer recording phase. However, if other interfering sounds and background noise are present, then the adaptive arrangement will try to cancel these interfering sounds, which may end up in poor adaptation if the interference is a diffuse noise field. The adaptive arrangement may even fail completely if the target 114 is excited during jammer recording (i.e., if the target person speaks when he/she is not supposed to). In this case, the target is treated in part as a jammer and in part as a target, with the result being degraded performance.
It is therefore an object of the present invention to provide apparatuses and methods for calibrating a beamformer that do not require a large memory resource.
It is a further object of the present invention to provide improved echo cancellation in a hands-free communications environment.
The foregoing and other objects are achieved in methods and apparatuses for calibrating a beamformer for use as an acoustic echo canceler in a hands-free communications environment having a loudspeaker and a plurality of microphones. In accordance with one aspect of the invention, the beamformer calibration is performed by providing a plurality of adaptive filters in correspondence with each of the microphones, and training each of the adaptive filters to model echo properties of the hands-free communications environment as experienced by the corresponding one of the microphones. A target source is activated, thereby generating an acoustic signal that is received by the microphones. The trained adaptive filters are then operated to generate jammer signals. Pseudo noise signals may be supplied to the inputs of the adaptive filters for this purpose. Respective ones of the jammer signals are then combined with corresponding signals supplied by the microphones, thereby generating combination signals. The combination signals are then used to adapt the beamformer to cancel the jammer signals.
In another aspect of the invention, the step of training each of the adaptive filters to model echo properties of the hands-free communications environment as experienced by the corresponding one of the microphones includes the steps of supplying pseudo noise signals to the loudspeaker, thereby causing the loudspeaker to generate acoustic signals and using each of the microphones to generate a microphone signal. The pseudo noise signals are also supplied to each of the adaptive filters, which generate echo estimate signals therefrom. Each of the echo estimate signals is combined with a corresponding one of the microphone signals, thereby generating a plurality of combined signals. Each of the adaptive filters is then adapted so that the corresponding combined signal is minimized. A least mean squared algorithm may be used for this purpose.
In another aspect of the invention, the adaptive filters used for calibration of the beamformer are further utilized during normal operation of the now-calibrated beamformer. In particular, an echo generated in a hands-free communications environment having a loudspeaker and a plurality of microphones may be canceled by providing a plurality of adaptive filters in correspondence with each of the microphones and training each of the adaptive filters to model echo properties of the hands-free communications environment as experienced by the corresponding one of the microphones. A beamformer is also provided that has been calibrated for use as an acoustic echo canceler in the hands-free communications environment. In an advantageous embodiment, the beamformer is calibrated in accordance with the techniques described above.
During normal operation, each one of the adaptive filters is used to generate an estimate of an echo signal as experienced by a corresponding one of the microphones. Each of the estimated echo signals is combined with a corresponding microphone signal, thereby generating a plurality of combined signals having reduced echo components. Then, the beamformer is used to generate an output signal from the plurality of combined signals, wherein the output signal has further reduced echo components.