This invention relates to echo cancellation in telephones and, in particular, to estimating bulk delay for adjusting an adaptive filter in echo cancelling circuitry. As used herein, “telephone” includes cellular telephones and land lines.
There are two kinds of echoes in telephones, an acoustic echo from the path between an earphone or a speaker and a microphone and a line echo generated in the switched network for routing a call between stations. Acoustic echo is typically not much of a problem in a wired telephone with a handset. For speaker phones and cell phones, acoustic feedback is much more of a problem. In a speaker phone, a room and its contents becomes part of the audio system and provide an acoustic path from speaker to microphone. In a cellular telephone, the enclosure provides an acoustic path from speaker to microphone.
There are several potential sources for line echoes. Hybrid devices (two-wire to four-wire converters) located at terminal exchanges or in remote subscriber stages of a fixed network are the principal sources of line echo. Apparatus for removing or minimizing echoes include echo suppressers, echo cancellers, and adaptive filters; see Digital Signal Processing in Telecommunications by Kishan Shenoi, Prentice-Hall, 1995, Chapter 6 (pages 334-385). “Suppression” is attenuation. Echo cancelling involves subtracting a local replica of the echo from the signal to eliminate an echo. The local replica is created by filtering the signal with an adaptive filter. The adaptive filter models either the near-end (speaker to microphone) or the far end (line out to line in) transfer function, which is assumed to be linear and time invariant; Shenoi, pg. 348. Unfortunately, the assumption is somewhat optimistic.
The impulse response of a typical echo path is shown in FIG. 1. This echo path is typically modeled by finite impulse response (FIR) filter. As seen in FIG. 1, the echo path has a bulk delay tbd. This bulk delay is caused by the delays inherent in telephone networks, which can vary from network to network. For example, the bulk delay can vary from 100 ms to 500 ms. Hence, in order to cancel the network echo, one needs a long (many taps) adaptive filter. For example, to cancel a network echo with a bulk delay tbd of 448 ms and echo tail td of 64 ms, see FIG. 1, one needs 4,096 taps in an FIR filter in a system with an 8 kHz sampling rate.
Long adaptive filters suffer from inherent problems, such as slow convergence rate and large residual echo, and from implementation issues such as the need for very high rates of executing instructions (MIPS—millions of instructions per second) and the need for large amounts of memory.
If one can estimate the bulk delay, then it is possible to cancel network echo with a short adaptive filter. This can be achieved by appropriate buffering of data samples. For example, in a system sampling at 8 kHz, to cancel network echo with a bulk delay of 448 ms and echo tail equal to 64 ms, only 512 taps are needed in an FIR filter if the bulk delay is known a priori. Thus, estimating bulk delay is essential for efficient network echo cancellation.
Most of the adaptive filters used in echo cancellers are implemented using least mean square (LMS) or fast affine projection algorithms. These algorithms are widely used in echo cancellers due to their computational simplicity, even though the performance of these algorithms is poor when compared with the high performance recursive least square (RLS) algorithm. Many bulk delay estimation methods are mentioned in the literature. Most of these bulk delay estimation methods are based on adaptive filters. These algorithms estimate the bulk delay by explicitly computing the impulse response of the echo path. Once the impulse response of the echo path is known, then the bulk delay can be calculated by finding the centroid of the impulse response. Specifically, if he(n) is the impulse response of the echo path, then the bulk delay estimate is given by the following equation.       BD    =                                                      T              s                        ⁢                          ∑                              n                ⁢                                                                   ⁢                                                      h                    e                    2                                    ⁡                                      (                    n                    )                                                                                            ∑                                          h                e                2                            ⁡                              (                n                )                                                    ⁢                                   ⁢        n            =      1        ,  2  ,      …    ⁢                   ⁢    N  N is the order of the LMS filter The value of N is dependent upon maximum possible bulk delay and the echo tail. In particular, the value of N is directly proportional to the maximum possible bulk delay. If the value of N is high, the performance of the LMS filter degrades because the convergence time of the LMS filter is long and the residual error of the echo is high. The result is a poor estimate of the bulk delay. As noted above, there are also computational and memory problems due to the large number of taps used in an FIR implementation of an LMS filter. Therefore, LMS filters are not feasible when the bulk delay is long (e.g. greater than 100 ms.).
Due to these problems with the adaptive filters, other estimation methods were developed; e.g. U.S. Pat. No. 4,582,963 (Danstrom) and U.S. Pat. No. 6,078,567 (Traill et al.). The Danstrom patent discloses an edge detection method. Bulk delay is estimated by detecting an edge in the transmit direction and detecting an edge in the receive direction. Edge detection is performed by comparing the signal level with some threshold. Finally, the bulk delay estimate is obtained using the time difference between the transmit and receive detected edges.
A problem with this method is that most of the time the receive detected edge does not necessarily correspond to the transmit detected edge. The receive detected edge may correspond to far end speech (double talk condition) or noise or spikes. Under these conditions, there is a poor estimate of bulk delay. Moreover, this method requires that there be a period of quiet before the transmit edge is detected. The patent discloses that the duration of this quiet period should be equal to the maximum possible bulk delay. In many applications, the minimum bulk delay is at least 100 ms and closer to 500 ms. In a typical telephone conversation, it is rare to have such a long quiet time preceding near end speech. Hence, the bulk delay estimate obtained using this method is unreliable in most real-life telephone conversations.
The Traill et al. patent discloses a cross-correlation method. Theoretically, cross-correlation is the best method for measuring the similarity between any given set of signals. A problem with cross-correlation is that it is necessary to find the correlation between the two signals for all possible time delays in order to estimate the delay between the two signals. In particular, assuming that there are thirty-two samples, then it requires thirty-two multiplication and addition operations to perform the cross-correlation for a single time delay. There are thirty-one possible time delays, resulting in nine hundred ninety-two multiplication and addition operations. Thus, cross-correlation is computationally intensive and undesirable.
In view of the foregoing, it is therefore an object of the invention to provide an improved method and apparatus for estimating bulk delay.
Another object of the invention is to provide a method for estimating bulk delay that is not computationally intensive, i.e. does not require a high MIPS processor.
A further object of the invention is to provide a method for estimating bulk delay that does not require large amounts of memory.
Another object of the invention is to provide a method for estimating bulk delay that works well in noisy or in double-talk conditions.
A further object of the invention is to provide a method for estimating bulk delay that can be repeated during a telephone call, enabling the telephone to adapt to changing conditions during a call; e.g., cell phone handoffs.