The present invention relates generally to voice communication systems and, more particularly, to an electronic system for reducing acoustic echo. Speakerphones, which employ one or more microphones together with one or more speakers to enable "hands-free" telephone communication, can allow "hands-free" communications as well as participation in a conversation by a number of persons. Modern speakerphones are capable of operating in a variety of modes, which include, single-talking mode in which the transmission of voice information is in a single direction, and double-talking mode, in which the voice information is transmitted by both sides, which increases the interactivity of the conversation, but also causes a phenomena known as "acoustic feedback echo", in which acoustic energy transmitted by the speaker of the speakerphone is picked up by the microphone of the same speakerphone. Typically, speakerphones utilize an Acoustic Echo Control (AEC) device to reduce this echo by generating an estimate of the expected feedback ("acoustic echo") between the speaker and the microphone, and subtracting the expected echo from the signal produced by the microphone (the "near-end signal") before transmission of the signal to a remote communications station. Generally, the AEC device is adaptive in the sense that changes in the acoustic echo path are accounted for in generating the estimated echo.
Typical AEC devices consist of an echo canceller filter cascaded with a non-linear processor. The echo canceller filter generates a linearly corrected near-end signal and the non-linear processor, in conjunction with a talking mode detector, which detects various talking modes (single-talking, double-talking, etc.), provides additional echo attenuation for certain talking modes. The additional attenuation provided by the nonlinear processor increases the echo cancellation performance, also known as the echo return loss enhancement, but reduces the degree of the double-talking operation, therefore, reducing the interactivity of the conversation. Thus a typical echo control device strikes a compromise between the interactivity and the echo return loss performance.
Under steady state conditions, the echo canceller converges to very nearly cancel the echo, tracking only gradual changes to avoid instability. Sudden changes occurring in the echo path, such as a relative repositioning of the speaker and microphone disturb the system. Typical echo-cancellers and nonlinear processors are slow to respond to a sudden change in the echo path. Thus, when a change happens, the system performance, such as echo return loss enhancement and stability, is significantly degraded until the system eventually re-converges.
To respond to sudden changes in the echo path, some AEC devices utilize a convergence detector to monitor the convergence of the echo canceller filter. Such detectors rely on the principle that the adaptive filter will diverge when sudden echo path changes occur. The degree of convergence of the filter can be detected by examining the cross-correlation between the estimated echo and estimation error. By the principle of orthogonality, the cross-correlation should be nearly zero when the filter is converged. However, for a practical environment, and especially in the presence of double-talk (double-talking operation), false divergence detection is frequent. This is because speech from independent sources (near end, far end) has similar spectral and temporal characteristics, and detectors which employ short-term estimation tend to predict a non-zero cross correlation. Only if averaged for a substantial time can the cross-correlation be guaranteed to approach zero. Thus, a convergence detector based solely on cross-correlation is necessarily a compromise between accuracy and response time.