The present invention relates double-talk detection in transceivers, and more particularly to double-talk detection in hands-free wireless transceivers. Even more particularly, the present invention relates to robust double-talk detection in hands-free cellular telephone transceivers.
Cellular telecommunication systems in North America are evolving from their current analog frequency modulated (FM) form towards digital systems. Digital systems must encode speech for transmission and then, at the receiver, synthesize speech from the received encoded transmission. For the system to be commercially acceptable, the synthesized speech must not only be intelligible, it should be as close to the original speech as possible.
Codebook Excited Linear Prediction (CELP) is a technique for speech encoding. The basic technique consists of searching a codebook of randomly distributed excitation vectors for that vector which produces an output sequence (when filtered through pitch and linear predictive coding (LPC) short-term synthesis filters) that is closest to the input sequence. To accomplish this task, all of the candidate excitation vectors in the codebook must be filtered with both the pitch and LPC synthesis filters to produce a candidate output sequence that can then be compared to the input sequence. This makes CELP a very computationally-intensive algorithm, with typical codebooks consisting of 1024 entries, each 40 samples long. In addition, a perceptual error weighing filter is usually employed, which adds to the computational load.
A number of techniques have been considered to mitigate the computational load of CELP encoders. Fast digital signal processors have helped to implement very complex algorithms, such as CELP, in real-time. Another strategy is a variation of the CELP algorithm called Vector-Sum Excited Linear Predictive (VSELP) Coding. An EIA/TIA IS-54 standard that uses a full rate 8.0 Kbps VSELP speech coder, convolutional coding for error protection, differential quadrature phase shift keying (QPSK) modulation, and a time division, multiple access (TDMA) scheme has been adopted by the Telecommunications Industry Association (TIA). See EIA/TIA IS-54B. The Electronic Industries Association (EIA) published EIA/TIA IS-55 for the dual-mode mobile station, base station cellular telephone system compatibility standard. This standard incorporates a VSELP codebook search method that is disclosed in U.S. Pat. No. 4,817,157 by Gerson.
The CELP-based coders, which use LPC coefficients to model input speech, are adequate for clean signals, however, when background noise is present in the input signal, the CELP-based coders inadequately model the signal. This results in some artifacts at the receiver after decoding. These artifacts, referred to a swirl artifacts, considerably degrade the perceived quality of the transmitted speech. U.S. patent application Ser. No. 08/169,789 of Ganesen et al., entitled REMOVAL OF SWIRL ARTIFACTS FROM CELP BASED SPEECH CODERS and filed Dec. 20, 1993, commonly assigned with the present patent document, and incorporated herein by reference, improves upon conventional CELP-based speech coders, and removes the swirl artifacts through the use of a voice activity detector (VAD).
In hands-free telecommunication, a high-sensitivity microphone and a high-volume, i.e., loud, speaker are employed, which allow a mobile user to hear incoming (or far-end) communications, and to transmit outgoing (or near-end) communications without the need for the mobile user to hold a handset to his or her ear and mouth.
Problematically, hands-free transceivers are susceptible to local acoustic echo (or a false double talk condition), which is caused by far-end audible signals, e.g., voice signals from a remote unit, being sounded through the mobile unit's speaker and simultaneously received (or fed back) into the mobile unit's microphone. As a result, the audible signals are retransmitted to the remote user causing an echo of the far-end audible signals. This problem arises because near-end speech, i.e., speech received through the microphone, can be caused by the mobile (or near-end) user or by the mobile unit's speaker, i.e., by the far-end user. Conventional hands-free mobile units are unable to distinguish between signals originating at the near end and signals originating at the far end, and, in order to make such a distinction, very complex processor-intensive processing would need to be performed.
In one type of conventional hands-free mobile unit, the microphone is "turned off," i.e., acoustic signals received into the mobile unit's microphone are not transmitted, whenever incoming voice signals are detected by a receive speech detector. This prevents the mobile unit from re-transmitting (or echoing) the incoming (far-end) voice signals after they are sounded by the mobile unit's speaker. As a result, echo, or feedback of the far-end user's transmission, is prevented. Unfortunately however, in such an implementation, the mobile (or near-end) user is unable to "interrupt" the remote (or far-end) user, because the mobile user's microphone is "turned off" whenever the far-end voice signals are received, i.e., whenever the remote user is transmitting. Thus, the mobile user must wait to transmit until the remote user stops transmitting.
Such an implementation becomes particularly problematic when the mobile user is attempting to communicate with another mobile user who is using a similarly configured hands-free cellular transceiver, because both cellular transceivers will occasionally "turn off" their microphones in response to detected double talk, i.e., in response to each detecting an incoming signal. In this case, communication will "lock up" until one or the other of the cellular transceivers ceases echo suppression, i.e., turns its microphone back on. Problematically, such "lock up" may not be readily detectable by either mobile user, because both users may be talking and assume that their signals are being heard through the opposite mobile user's speaker.
Thus, improvements are needed in double-talk detection and echo suppression for hands-free cellular telephone transceivers, and other communications transceivers that exhibit local acoustic echo and/or false double talk conditions.