The invention relates generally to signal processing in a hands-free loudspeaking system, referred to herein as a xe2x80x9cspeakerphone.xe2x80x9d
The performance of a speakerphone is judged by its ability to approach the full-duplex ideal. An ideal full-duplex speakerphone allows parties at opposite ends of a telephone connection to talk simultaneously without significant modification to either the far-end signal or the near-end signal. At the same time, no audible echo should be allowed.
Echo is caused by the coupling of loudspeaker sound into a microphone transducer. Echo can be controlled by either suppression or cancellation. FIG. 1 shows a typical structure of an acoustic echo suppression system, while FIG. 2 shows an acoustic echo cancellation system with supplemental echo suppression. Acoustic echo suppression requires that either or both of the received path (received far-end signal Rin) or the send path (send near-end signal Sout) be attenuated to a sufficiently low degree so that no echo is perceived. On the other hand, acoustic echo cancellation uses a linear model to predict the acoustic echo signal, so that the prediction may be used to remove the echo signal component.
In the echo suppression system of FIG. 1, the received far-end signal Rin is directed to a first measurement component 10 and a received signal attenuator 12. The output Rout of the attenuator passes through a digital-to-analog converter 14 and an amplifier 16 to a speaker 18 that projects the sound. When the near-end talker is speaking, the voice information is converted to an electrical signal by a microphone 20. The signal passes through an amplifier 22 and an analog-to-digital converter 24. In addition to the desired voice information within the signal Sin, there may be undesired acoustic echo information. The Sin signal is directed to both a second measurement component 26 and a send signal attenuator 28. The signal Sout that is the output of the attenuator 28 is the signal which is transmitted to the far-end party of the telephone call. As shown in FIG. 1, the measurements of the measurement components 10 and 26 are received at an activity detection and control component 30. It is this component that utilizes the measurements to generate separate control signals (shown as dashed connections) for the two attenuators 12 and 28. In the echo cancellation system of FIG. 1, cancellation occurs along both the receive path and the send path.
Referring now to the cancellation/suppression system of FIG. 2, many of the components are duplicated from FIG. 1. As noted, acoustic echo cancellation uses a linear model to predict the acoustic echo signal. Echo cancellation requires a reference signal that consists of the signal Rout to the speaker 18. In total, four measurement components 32, 34, 36 and 38 are employed, with each measurement being directed to an activity detection and control component 40. Using the different components, the reference signal is convolved using a linear acoustic echo model to produce an echo prediction signal Sep that is subtracted from the microphone signal Sin at a summing device 42 in order to cancel echo. In principle, the acoustic echo model can be accurately determined using an adaptive filter 44, wherein the loudspeaker signal Rout is the reference signal and the echo-cancelled signal from the summing device 42 is used as error feedback to drive the adaptation of the adaptive filter.
The difficult task with all full-duplex speakerphones is that it is necessary to determine how much of the echo-cancelled signal is composed of residual echo and how much is valid near-end talker energy. If this composition is known, effective activity decisions can be determined. If there is substantial residual echo and little near-end talker energy, the adaptive filter 44 of FIG. 2 should be enabled to rapidly adapt its coefficients. Conversely, if there is substantial near-end talker energy, the coefficient adaptation process should be disabled, because the near-end talker interference may cause divergence of the adaptive filter. In addition, only a minimal amount of suppression should be applied in such a situation, so that echo is not audible. Echo will be audible if it is of sufficient level that the echo cannot be masked by either background noise or by the valid near-talker signal. When echo is audible, suppression is required to eliminate the echo. Thus, the system of FIG. 2 includes the attenuators 12 and 28 that provide echo suppression in addition to the system""s echo cancellation. However, in practice, differentiating between residual echo and valid near-end talker energy is problematic.
The process of estimating the composition of the echo-cancelled signal into residual echo and valid near-talker components is important to maximizing full-duplex speakerphone performance. With accurate knowledge of the composition, the adaptation processing can be optimized, and full-duplex conversation with minimal suppression is possible, even if the residual echo is substantial.
There is known prior art in the field of adaptive echo cancellation in the area of estimating the composition of the echo-cancelled signal. Conventionally, several measurements are made in determining the composition of the echo-cancelled signal. As shown in FIG. 2, there are four measurement components 32, 34, 36 and 38. Measurements may consist of a sophisticated spectrum analysis or may be made by a related use of a correlation analysis between the signals at the measurement points.
There is also known prior art in the field of managing the signal level and signal spectrum to improve performance in utilizing loudspeakers to enhance intelligibility, to minimize the power required by the loudspeaker, to control the signal level so as to prevent the loudspeaker from being overdriven, and to modify the spectrum based on the signal power in order to make the signal appear more natural to human ears. U.S. Pat. No. 5,515,432 to Rasmusson, U.S. Pat. No. 5,636,272 to Rasmusson, U.S. Pat. No. 5,790,671 to Copper, and U.S. Pat. No. 5,907,823 to Sjxc3x6berg et al. relate to enhancing the intelligibility of the loudspeaker sound, while controlling the required power and preventing the loudspeakers from being overdriven. Historically, the intelligible signal of human speech is carried primarily by frequencies above approximately 1000 Hz, while approximately ninety percent of normal speech power is in the frequencies below 1000 Hz. Innovations in the art are often centered on the means for enhancing performance, given these facts.
It is also known that humans perceive loudness of signals differently, depending on the pitch of signals. Thus, when music is played at a low volume, the low-frequency voices and instruments are perceived by humans as being more xe2x80x9cnaturalxe2x80x9d if they are amplified to a greater extent than the high-frequency voices and instruments. Consequently, many audio systems have a manual control to enable a filter to boost the low frequencies. As the music is increased in volume, the low frequencies seem unnaturally strong. In this case, the low frequencies can be played at normal level or may even be suppressed. Thus, a system may gradually use a loudness filter characteristic on the basis of either (1) the volume level selection or (2) the measured signal level. The above-identified patent to Cooper controls the filter properties based on a combination of the volume control selection and the measured ambient noise level. When high ambient noise is present, the relative gain of the higher audio frequencies is increased at the expense of low frequency response. Thus, a degree of xe2x80x9cnaturalnessxe2x80x9d is traded for the higher intelligibility provided by increased high frequency gain.
The performance of a speakerphone system is enhanced by dynamically filtering a received far-talker signal in a manner which is preferential to passing high audio frequencies and adjustable with regard to passing audio frequencies within a low frequency band. Filtering characteristics for attenuating low frequency components of the far-talker signal are based upon controlling a level of echo within a near-talker signal. Activity at a remote site from which the far-talker signal is generated and activity at the local site from which a near-talker signal is generated are both considered in determining the filter characteristics. In one embodiment, the filter characteristics are adjusted by varying a cutoff frequency along the low audio frequency band, with a tendency toward a higher cutoff frequency as one or both of far-talker activity and near-talker activity increase.
The far-talker signal is received along an input signal path that extends to a first transducer (e.g., a loudspeaker) of the speakerphone system. The input signal path includes a dynamic highpass filter having generally flat filter characteristics for high frequency components and having the dynamic filter characteristics for the low frequency components. Optionally, the signal path may include a signal attenuator.
The dynamic highpass filter functions as a first echo controller by manipulating the far-talker signal to reduce the likelihood that the audible output of the first transducer will be coupled to a second transducer (e.g., a microphone) which is used to generate a near-talker signal for transmission to the remote site at which the far-talker signal was generated. The output signal path from the second transducer includes a second echo controller, which may provide conventional echo suppression, conventional echo cancellation, or both.
A near-talker subband analyzer has inputs from both the input and output signal paths. The analyzer is configured to compare the current strengths (i.e., presently measured powers) for each of a number of different audio frequency subbands. Based upon the comparisons, the analyzer calculates a prediction of the level of echo within the near-talker signal. The output of the analyzer may be indicative of the prediction of the echo level, but could be indicative of the valid near-talker level, since the two levels may be considered to be inverses of each other.
A filter control algorithm component receives the prediction from the near-talker subband analyzer and receives a recent power measurement of the far-talker signal. Based upon these two inputs, the filter control algorithm component selects the filter characteristics for the dynamic highpass filter. In one embodiment, as the power of the far-talker signal increases, a cutoff frequency of the filter is increased, with the cutoff frequency being manipulated in the range of 20 Hz to 1000 Hz and being defined as the point at which filter attenuation is 3 dB relative to the flat portion of the frequency response at the upper audio frequencies. However, other embodiments are contemplated. This tendency to increase the cutoff frequency with an increase in far-talker signal power reduces the strengths of the low frequency components, which are more likely to induce loudspeaker-to-microphone coupling (i.e., echo). Similarly, the cutoff frequency tends to increase with detected increases in valid near-talker activity. In order to more accurately determine valid near-talker activity, the system may include a noise floor estimator that is designed to differentiate true activity from recurring background noise, so that the noise component may be disregarded.
An advantage of the invention is that the low frequency components of the far-talker signal are dynamically controlled to reduce echo power within the near-talker signal. Another advantage is that the system reduces the likelihood that the loudspeaker will be driven into its low-volume, non-linear region. Less power is required to produce an intelligible signal. Moreover, the invention reduces the difficulty in distinguishing valid near-talker activity from echo.