The present invention relates to signal processing, and more particularly to voice activity detection, automatic gain control, echo cancellation, and echo suppression devices and methods.
Hands-free telephones (e.g., speakerphones) provide conveniences such as conversations while driving an automobile and teleconferencing with multiple speakers at a single phone. However, acoustic reflections of the loudspeaker output of a hands-free phone to its microphone input simulate another participant speaker and thus appear as an echo to the original remote speaker. Acoustic echo cancellation and echo suppression attempt to minimize these effects.
Acoustic echo cancellation methods approximate the properties of the loudspeaker-to-microphone acoustic channel and thereby can generate an approximation of the microphone pickup of sounds emitted by the loudspeaker. Then this approximation can be cancelled from the actual microphone pickup. Acoustic echo cancellation typically uses adaptive filtering to track the varying acoustic channel; see Dutweiler, Proportionate Normalized Least-Mean-Squares Adaptation in Echo Cancelers, 8 IEEE Tran. Speech Audio Proc. 508 (2000).
However, long echo paths (e.g., 400 ms) at high sampling rates (e.g., 16 KHz) leads to filters with a large number of taps (e.g, 6400). This makes the complexity of the filter convolution very high, so frequency domain techniques are often used in these applications; see J. Shynk, Frequency-Domain and Multirate Adaptive Filtering, IEEE Signal Processing Magazine 14 (January 1992). Frequency-domain multiplication is much cheaper than time-domain convolution, and the signals can be efficiently transformed from time domain to frequency domain by Fast Fourier Transforms (FFTs).
Since the Fourier Transform of a long echo path can be too large for practical situations, partitioning the echo canceller filter into smaller subfilters allows the use of shorter transforms; see C. Breining et al, Acoustic Echo Control, IEEE Signal Processing Magazine 42 (July 1999).
Estimation of the echo residual after echo cancellation allows for echo suppression by gain adjustment. Echo suppression may be applied to degrees of full, partial, or half-duplex communications; see ITU-T Recommendation P.340 Transmission Characteristics and Speech Quality Parameters of Hands-free Terminals (May 2000) and ETSI TR 101 110-GSM 3.58 Digital Cellular Telecommunications System: Characterization Test Methods and Quality Assessment for Handsfree Moblie Stations v.8.0.0 (April 2000).
In general, a hands-free phone provides automatic gain control (AGC) to make the loudspeaker output voice level match a user-specified target level. The goal of AGC design is to adjust the voice level as fast as possible while minimizing the output signal distortion (e.g., peak clipping). Also, the AGC should be designed to avoid excessively amplifying background noise (silence intervals). A voice activity detector (VAD) helps the AGC avoid amplifying background noise. The common usage of VAD for the AGC is to adjust the voice level when the VAD decision indicates voiced input (speech intervals) but to leave the gain-scaling unchanged when the VAD decision indicates unvoiced input (silence) intervals). A problem with this method is that the VAD decision error could cause audible distortion in the output speech. An accurate VAD could minimize the decision error, but it may require a complicated algorithm, and in turn, higher computational complexity; see for example, P. Chu, Voice-Activated AGC for Teleconferencing, Proc. IEEE ICASSP 929 (1996).