The present invention relates to signal processing, and more particularly to voice activity detection, acoustic echo cancellation, and echo suppression devices and methods.
Hands-free phones (e.g., speakerphones) provide conveniences such as conversations while driving an automobile and teleconferencing with multiple talkers at a single speakerphone. However, acoustic reflections of the loudspeaker output of a hands-free phone to its microphone input simulate another participant talker and thus appear as an echo to the original remote talker. Thus hands-free phones require acoustic echo control to sufficiently reduce the echo in the sending path (uplink, near-end). Acoustic echo control is done by acoustic echo canceller (AEC) and echo suppressor (ES). AEC estimates the linear echo path between the receiving-side (downlink, far-end) output loudspeaker and the sending-side (uplink) input microphone, and subtracts the estimated echo from the uplink signal. In practical cases, AEC does not completely remove all of the acoustic echo, and ES attenuates the AEC residual echo to make the far-end echo inaudible at uplink output. Typical implementations of AEC and ES functions are in digital systems (e.g., analog signals sampled at 8 kHz and partitioned into 20 ms frames of 160 samples each) where the AEC applies an adaptive FIR digital filter to estimate the echo from the signal driving the loudspeaker and updates the filter coefficients after each frame.
Estimation of the echo residual after echo cancellation allows for echo suppression (ES) by gain adjustment. Echo suppression may be applied to degrees of full, partial, or half-duplex communications: see ITU-T Recommendation P.340 Transmission Characteristics and Speech Quality Parameters of Hands-free Terminals (May 2000) and ETSI TR 101 110-GSM 3.58 Digital Cellular Telecommunications System: Characterization Test Methods and Quality Assessment for Handsfree Moblie Stations v.8.0.0 (April 2000).
However, if AEC performance is significantly degraded and the AEC residual echo level is the same or higher than near-end speech level ES does not properly distinguish double-talk from far-end echo, makes acoustic system half-duplex by attenuating both far-end echo and near-end speech while far-end is talking. This problem is often observed if severe nonlinear distortion is present in the echo path. For example, in mobile phone speakerphone applications, the loudspeaker is overdriven and the distance between loudspeaker and microphone is short. This phone setting could cause severe nonlinear distortion in the echo path. As a result, conventional ES allows mobile phone to provide only half-duplex communication and significantly degrades communication quality.