Echo cancellation is typically used in telephony to describe the process of removing echo from a voice communication in order to improve voice quality on a telephone call. Adaptive filters are used to model the echo remove the echo from the signal. In telepresence video conferencing systems, the audio signal has high sampling rate and the filter length increases proportionally to the sampling rate. How to manage the computation complexity for high definition audio cancellation is a challenge task.
Echo cancellation has been used extensively in telecommunications, cellular phone and video conferencing. The search for mathematical algorithms to perform echo cancellation has produced many different approaches with varying degrees of complexity, cost, and performance.
In some applications, for example the cancellation of acoustic speech echoes, the echo duration can be extremely long, in the order of 100 msec to 500 msec. A traditional approach to echo cancellation uses an adaptive transversal filter of length L, where L equals the number of samples necessary to extend just beyond the duration of the echo. The computational requirement is proportional to 2 L for the popular LMS class of algorithm, and proportional to L2 or higher for algorithms such as RLS. The more robust algorithms (RLS being one example) have improved convergence characteristics, but the computational load increases dramatically with L. It is also fair to say that the convergence time increases exponentially with the size of L for most algorithms. It is important to have fast convergence, and this is especially true in the example of acoustic speech echo cancellation because the echo path may be continually changing as people and objects move within the environment. An echo canceller that can deal with an echo length of 500 msec or more has problems with computational complexity as well as convergence speed.
In the recent application of Tele Presence systems, the high definition quality of an audio signal with a 48 KHz sampling rate further increases computational complexity (or MIPS requirement) for echo cancellation. A 256 ms. echo tails means a filter length of 12288 samples and adaptation has to be done 48000 times per second. A simple LMS approach will need a 1200MIPS operation. To reduce the computational burden, one commonly known approach, known as sub-band processing, involves separating the speech signal into frequency bands and processing each band separately. This has some inherent advantages, most notably reduced computational complexity, and increased convergence speed. Such as system is described in Q. Jin, K. M. Wong and Z. Q. Luo, “Optimum Filter Banks for Signal Decomposition and Its Application in Adaptive Echo Cancellation”, IEEE Trans. on SP. Vol. 44, No. 7, 1996, pp. 1669-1680, and U.S. Pat. No. 5,937,009, the contents of which are herein incorporated by reference.
Sub-band processing is an attractive approach because it reduces computational complexity. By dividing the signal into M sub-bands, there are M adaptive filters to implement instead of only 1, but these sub-band signals can be down-sampled by a factor of M, consequently the filter outputs need only be calculated 1/M as often. Additionally the length of the filters themselves is reduced from length L to length L/M. This has the overall effect of reducing the computational complexity (not including filter banks) to something on the order of 2 L/M for LMS type adaptive filters, which also improves convergence behavior due to the use of shorter LMS filters. It can be seen that when L is large, there is a significant reduction in computational load, making the overhead necessary for filter banks insignificant.
A typical prior echo cancellation technique using a sub-band filter bank is shown in FIG. 1. Both echo and reference signal are decomposed into sub-bands and the adaptive algorithm is implemented in each individual band. Finally, the echo-reduced signal is reconstructed with a bank of synthesis filters.
The problem with the sub-band filter bank approach is that the transition between bands makes it impossible to perfectly isolate each band from the adjacent ones without the use of “ideal” band pass filters. “Ideal” in this context means filters with infinitely sharp cut-off. There is a trade-off between the amounts of echo cancellation possible, the filter roll-off, filter group delay distortion, and reconstruct ability of the sub-bands to regenerate the original input signal without distortion. A type of filter known as a QMF is one method of filter bank design that has been used in the past to help overcome these problems.
The main concern with echo cancellation using sub-band decomposition is that the down sampling process creates distortion in each band due to aliasing. This effect causes the echo channel to be time-varying, a violation of an underlying assumption that we need to make in order to apply known methods of adaptive filters for voice echo cancellation. The echo channel must be both linear and time-invariant. Any processing done on the signal decomposition invalidates this property and results in signal distortion. This limits the amount of overall achievable echo cancellation using the method of sub-band decomposition and reconstruction.
One previous approach to fix the aliasing problem is cross-band echo cancellation described in U.S. Pat. No. 5,937,009. It uses adjacent band to cancel the aliasing echo component when the sub-band filter is not a brick-wall filter. The problem with such approach is that the computation complexity for LMS filter increases by three times.