Many types of communication devices exist which allow for hands free communication between two parties in separate rooms. Such devices include speakerphones, public address systems for auditoriums or meeting rooms, and audio/visual equipment for video classrooms. Furthermore, new technology is being rapidly developed which will make communication devices for audio/visual teleconferencing practical.
The rooms used for this type of communication are typically plagued by acoustical echoes (i.e. acoustical reverberations). These acoustical echoes arise when the far-end communication device provides the near-end communication device with a far-end output audio signal. This signal is then converted to sound by the audio system of the near-end communication device. In response, an acoustical echo is produced within the room. The echo along with the near-end user's speech is converted to a near-end audio signal by the near-end audio system. The near-end audio signal is then transmitted to the far-end communication device as the near-end output audio signal. When this signal is converted to sound by the audio system of the far-end communication device, the far-and user will have difficulty sorting out the near-end speech from the acoustical echo.
A current approach to eliminating the acoustical echo is to use a discrete-time linear adaptive filter. Such an adaptive filter is used to estimate the overall impulse response of the combined system formed by the room and the near-end audio system. From this estimate, the adaptive filter generates an estimation signal which estimates the component of the near-end audio signal produced by the near-end audio system which corresponds to the acoustical echo in the room. The estimation signal is then subtracted from the audio signal to produce the near-end signal.
A major problem associated with this approach is that the convergence time for estimating the overall impulse response of the room and audio system together may be much longer than the stationary period of the overall impulse response. As a result, changes in the room characteristics will lead to serious degradation of the performance of the adaptive filter because it cannot adapt rapidly enough. Such changes may include doors being opened or closed, movement of furniture or people, or changes in the direction of the microphone of the audio system.
Another flaw associated with this approach is that the presence of near-end speech is not readily handled by the adaptive filter. When near-end speech is added to the return path, it is suppressed by the adaptive filter. In order to alleviate this problem, conventional adaptive filter echo cancelers employ near-end speech detectors. These detectors are used to detect large near-end speech energy so that the adaptive filter computations can be suspended during the time interval of the near-end speech. This means that echo canceling is suspended during near-end speech. One undesirable result of this is simplex or one-way conversations. A second undesirable result is the inability of the adaptive filter to adapt to room changes during the time interval of the near-end speech.