The present invention relates generally to signal processing, and more specifically to techniques for canceling acoustic echo using an adaptive filter with adaptive step size and stability control.
Full-duplex hands-free communication systems are used for commonly many applications, such as speakerphone, hands-free car kit, teleconferencing system, cellular phone, and so on. For each of these systems, one or more microphones in the system picks up an acoustic signal emitted by a speaker and its reflections from the borders of an enclosure, such as a room or a car compartment. The propagation path for the reflections may change due to various factors such as, for example, movements of the microphone, loudspeaker, and/or speaker/user, volume change on the loudspeaker, and environment changes. As a result, electro-acoustic circuit in the system may become unstable and produce howling. In the case of a telecommunication system, users are also annoyed by listening their own voice, which is delayed by the path of the system. This acoustic disturbance is referred to as echo.
Echo cancellation is often required in many communication systems to suppress or eliminate echo as well as remove howling effect. For example, echo cancellation is typically used in a hands-free full-duplex environment, such as a vehicle or a room, where the speaker and microphone may be located some distance away from a user. Conventionally, echo cancellation is achieved by a circuit that employs an adaptive filter.
The adaptive filter performs echo cancellation by deriving an estimate of the echo based on a reference signal, which may be a line input from a communication or telematics device such as a cellular phone or some other device. This reference signal is filtered based on a set of filter coefficients or weights to derive the estimate of the echo, which is then subtracted from a near-end signal that includes the echo to be suppressed. The filter coefficients are typically “trained” (or adapted or updated) based on a least mean square (LMS) algorithm or a normalized least mean square (NLMS) algorithm. If the filter coefficients are effectively trained, then a more accurate estimate of the echo may be obtained and improved echo suppression can be achieved.
The training of the filter coefficients is typically controlled based on four different situations: only near-end user talking, only far-end user talking, both near-end and far-end users talking simultaneously, and both near-end and far-end users silent. The third situation is often referred to as double-talk. The near-end user is the one located near a microphone or a speakerphone at one end of the communication system, and the far-end user is the one located at the other end of the system and remote from the near-end user. The echo suppression is performed so that the far-end user hears only the speech from the near-end user and not the echo resulting from reflections of the far-end user's speech back to the microphone.
Typically, the filter coefficients are trained when far-end talk is present. The coefficients are then fixed (i.e., not adapted) otherwise. After the training, the coefficients are used to estimate and cancel echo from the far-end user's speech. The rate at which the coefficients are adjusted during training is determined by a step size parameter. A small step size results in a slow convergence rate for the adaptive filter, which may be unacceptable for fast changing environments (e.g., fast changes in the echo path). Conversely, a large step size results in a faster convergence rate for the adaptive filter when only far-end talk is present. However, the filter coefficients may become unstable or diverge if the selected step size is too large for adapting when near-end talk is present and/or the near-end noise is strong. A common method for avoiding this non-convergence problem is to pause the training of the filter coefficients whenever double-talk or only near-end talk is detected.
Double-talk is typically detected based on a cross-correlation of the near-end signal and the echo estimate (or the reference signal from the far-end user). Techniques for detecting double-talk are described in various references including (1) U.S. Pat. Nos. 5,418,848, 5,732,134, 6,108,412, 6,192,126 B1, and 6,269,161 B1, (2) European Patent Nos. EP-B1-0,053,202, EP-A2-0,439,139, and EP-A-0,454,242, (3) papers by H. Y e and B. X. Wu, “A New Double-Talk Detection Algorithm Based on the Orthogonality Theorem,” IEEE Trans. Communications, Vol. 39, 1542-1545, 1991, J. Benesty, et al., “A Family of Double-Talk Detectors Based on Cross-Correlation,” Proceedings of the IWAENC'99, Pocono Menor, Pa., USA, pp. 108-111, 1999, and K. Ghose and V. U. Reddy, “A Double-Talk Detector for Acoustic Echo Cancellation Applications,” Signal Processing, Vol. 80, pp. 1459-1467, 2000. In general, it is very difficult to reliably detect double-talk in many practical operating scenarios.
A technique to estimate so-called optimal step size for adjusting the filter coefficients based on an artificial delay of the filter coefficients are described in papers by C. Breining, et al., “Acoustic Echo Control—an Application of Very High-Order Adaptive Filters,” IEEE Signal Processing Magazine, July 1999, and A. Mader, et al., “Step-Size Control for Acoustic Echo Cancellation Filters—an Overview,” Signal Processing, Vol. 80, pp. 1697-1719, 2000. However, this technique introduces a delay that is undesired for many communication applications. Moreover, the step size obtained by this method may lead to the freezing of the coefficient training when the echo path changes, which is also undesirable.
As can be seen, techniques that can properly adjust the step size for the coefficients of an adaptive filter and which can improve convergence and avoid instability for acoustic echo cancellation are highly desirable.