Hands-free speaker phones continue to be increasingly popular in modern day society. Advantageously a speaker phone allows users free use of their hands and to move about in the proximity of the speaker phone. Speaker phones employ a loudspeaker and a microphone to establish a bi-directional voice communication link between a local user in a near-end location and a remote user in a far-end location. The loudspeaker transmits the speech of the remote user, and the microphone detects the speech of a local user.
Typically, the near-end location is an enclosure, such as a room or an automobile. The speech of the remote user is emitted from the loudspeaker, echoes throughout the enclosure, is received by the microphone, and is transmitted back to the remote user in the far-end location. The echoes create unacceptably disruptive feedback for the far-end user.
A solution is to incorporate an echo canceler to cancel echoes received by the microphone. Echo cancelers synthesize an echo signal that matches the echoes of the voice signal of the remote user that are received by the microphone. If the synthesized echo signal matches perfectly the actual echo received by the microphone, then a signal without echo is returned to the far-end location. Typically, the mechanism used to create the synthetic echo signal is a filter implemented in the time-domain, the frequency-domain, or operating in frequency subbands. The input to the filter is the signal from the far end (the same signal is emitted from the loud speaker). The filter output is the synthetic signal.
The echo canceler preferably uses an adaptive filter so that the filter's parameters (tap coefficients for time-domain implementations or bin weights for frequency-domain implementations) are modifiable to improve a match of the synthesized echo signal to the actual echo in the microphone signal. The closeness of the match between the actual and synthesized echo is typically measured by the power, or some other second-order statistic, in the echo canceled signal. A limitation of this measure is that it is accurate when only the remote user is speaking and the local user is not.
In conversations between people, the status of the conversation can be in one of four possible states. The event when the remote user is speaking but the local user is not is called “far-end talk.” Conversely, the “near-end talk” event is when the local user is speaking but the remote user is silent. The “double-talk” event occurs when both users speak simultaneously. When neither person speaks, the event is called “silence.” In a telephone conversation, people usually take turns speaking. Therefore, in the absence of any other sources of sound, the most common events are “far-end talk” and “near-end talk.”
The three non-silence events can arise due to background noises or other sources of sound on either end of the communication link. For example, if there is a radio operating in the near-end, then the state of the conversation can be in either the near-end talk event (if no sound is coming from the far-end) or the double-talk event (if there is sound coming from the far-end). However, the state cannot be in the silence or a far-end talk event since these events require silence in the near-end. Music from a radio is an example of a persistent near-end acoustic source.
For the purpose of echo cancellation, it is important to distinguish between the four types of events. The echo canceler cannot distinguish speech from any other type of acoustic signal, such as music from a radio, the noise of a dishwasher, or a dog barking. Therefore, from the echo cancelers perspective, double-talk occurs whenever the loud speaker is broadcasting sound simultaneously with sound being produced in the near-end room regardless of the original source of those sounds. Due to background noises, double-talk may be the most common condition in a hands-free telephone conversation using speaker-phones.
During periods of silence and near-end talk, there is not a far-end signal being emitted from the loud speaker. Therefore, there are no echoes to be canceled and the echo canceler is turned off. When far-end talk is detected, the echo canceler adjusts the parameters of the adaptive filter to synthesize an echo signal that matches the echo signal arriving at the microphone. Typical echo cancelers can operate effectively only during far-end talk. When double-talk occurs, the microphone signal consists of a sum of a near-end signal and echoes of the far-end signal. The presence of the near-end signal in the microphone signal hinders proper echo synthesis. The effect produces audible echoes in the signal sent back to the far-end. To prevent the feedback of echoes, the echo-canceler suspends modification of the adaptive filter. Typically, an echo-canceler includes a double-talk detector to determine the presence of a double-talk event and signal the adaptive filter accordingly. During double-talk, the adaptive filter is still synthesizing an echo signal that can be used for cancellation. Only the time varying adjustment of the adaptive filter parameters is suspended.
While the adaptive filter modification is suspended the echo paths of the enclosure may change as people move and interact with objects. Changes in the enclosure response cause changes in the echoes of the far-end signal that arrive at the microphone. Because the adaptive filter modifications have been suspended, the synthetic echo produced by the adaptive filter still matches the old echo but not the new. When the old synthetic echo is subtracted from the microphone signal, that now contains echoes due to the new enclosure response, the echoes are not canceled. The near-end signal, along with the uncanceled portion of the far-end echo are returned to the far-end. Even small changes in the echo paths of the enclosure can lead to loud echoes in the signal returned back to the far-end. To avoid sending loud echoes to the far-end, the echo canceler switches into a half-duplex mode of operation and the far-end signal is set to zero. Half-duplex communication is unnatural and hinders communication.
The bulk of the research and development in the field of echo cancellation has focused on two problems. First, the adaptive filters in echo cancelers must have very long responses to accurately match the real enclosure response. This presents a significant problem in its own right and a great deal of research has attempted to find practical implementations of very long filters that converge quickly to the enclosure response during far-end talk event. The second main focus of echo cancellation research has aimed to improve the ability of double-talk detectors to determine the instant that double-talk begins to occur. To date, there has been relatively little attention paid to the possibility that the adaptive filter may be adaptable to cancel echoes during the double-talk event. Existing attempts at adaptive filtering during double-talk have not provided an adequate solution for echo cancellation.
One approach has been to use a blind deconvolution technique for adaptive filtering during double talk. Blind deconvolution is a technique for separating a convolutive mixture, such as a mixture that takes place over space and time. A straight forward application of blind deconvolution techniques produces at best only a filtered version of the near-end signal which gives it an unnatural quality.
An improvement to the blind deconvolution technique is to provide a short-term whitening, learn the echo path response on the whitened signal by blind deconvolution, and apply the adapted filter on the original unwhitened signal. The resulting gradient descent technique adapts very slowly and is too slow for real-time applications. Furthermore, this technique could only cancel long-delay echoes but not echoes occurring in the span of the short-term whitening process.
Thus, a need exists to provide an improved echo canceler system that modifies the adaptive filter parameters during double-talk and eliminates the need for half-duplex operation. Such an invention is disclosed herein.