One of the most popular pieces of telephone equipment presently on the market is the speakerphone, a device which allows hands-free operation by utilizing a microphone and a loudspeaker to receive and transmit voice information or other sounds. Although speakerphones have been in existence for many years, they typically suffer from certain drawbacks which often make their use difficult or inconvenient. More particularly, because its microphone and loudspeaker are located in close physical proximity (often together in a small unit), the speakerphone is highly susceptible to interference from echoes caused by sounds generated by the loudspeaker and received by the nearby microphone and other noise generated by both acoustic and electrical sources. In addition, speakerphone systems are susceptible to electrical echoes. An electrical echo is generated when a portion of the electrical signal which represents the acoustical information is "reflected" as the signal travels along the electrical circuit and the reflected portion of the signal travels back to its source.
Acoustic echoes occur because sounds generated by the loudspeaker are detected by the microphone and transmitted back over the telephone line. The sounds generated by the speaker reach the microphone either directly or by reflections from the walls of the room in which the phone is located. One method of eliminating acoustic echoes is to utilize a "half-duplex" system in which either the speaker or the microphone is active, but both devices are not active at the same time. In such an arrangement, a circuit monitors the microphone output and turns off the speaker for a predetermined period of time when the microphone has detected a sound. Similarly, if the speaker is active, the microphone is disabled for a predetermined period of time.
Half duplex systems suffer from a problem that only one person of a two-person conversation can speak at any given time and, even then, the first syllable or two of speech is often "clipped" or lost due to the "dead" time interval that occurs when the microphone switches from the inactive state to the active state. When a half-duplex speakerphone is connected to a conventional handset phone, the clipping problem is acceptable, but when two such speakerphones are connected together, it is often very difficult to converse without losing a significant amount of information. In order to overcome the problems associated with half-duplex speakerphones, "full-duplex" speakerphones have been used in which both the microphone and the speaker are operational at all times, but such systems are extremely susceptible to acoustic echoes.
FIG. 1 is a block diagram for illustrating how acoustic and electrical echoes are generated in a speakerphone system. In particular, FIG. 1 shows a representation of a speakerphone consisting of a speaker 104 and a microphone 120 connected to speakerphone circuitry 106. The speaker 104 and the microphone 120 are located in a room 102 and the speakerphone circuit 106 is, in turn, connected by means of a two-wire receiving leg 108 and a two-wire transmitting leg 122 to a hybrid circuit 124. The hybrid circuit 124 converts the four-wire circuit, comprising legs 108 and 122, to a two-wire line 125. At the other end of the telephone line, a similar hybrid circuit 114 converts the two-wire line 125 back into a four-wire circuit comprising a receiving leg 112 and a transmitting leg 128.
An acoustic echo is generated when a sound emanating from the speaker 104 is transmitted into the microphone 120, for example, by reflection from the room walls 103, and is heard by a person at the other end of the phone line 125. Since there are typically a variety of paths by which a sound generated by speaker 104 can reach microphone 120, many acoustic echoes can be generated from the same initial sound. Three such echo paths (paths 100, 116 and 118) are illustrated in FIG. 1. The echoes are annoying to the person at the far end.
As previously mentioned, an electrical echo is generated in the speakerphone system when a portion of the electrical signal transmitted from the near end is reflected back to the source by impedance discontinuities in the electrical circuit, for example at the 4 wire/2 wire hybrid transformers. In FIG. 1 these electrical echoes are represented by signal reflection 117 from the near-end hybrid 124 and signal reflection 119 from the far end hybrid 114.
Acoustic and electrical echoes are particularly annoying when two persons are conversing over speakerphones because the echoes from a previously uttered phrase often arrive as the person has begun speaking a new phrase. Consequently, in some speakerphone systems, and, in particular, full-duplex speakerphone systems, adaptive filters are used to reduce the echoes to an acceptable level.
For example, at the near end, the signal reaching the loudspeaker is also applied to an adaptive filter whose output is subtracted from the electrical output of the microphone. The filter adjusts automatically to provide a transfer function nearly identical with that of the path through the loudspeaker and microphone by way of the acoustical coupling between those components. The subtractive process thus eliminates, or greatly reduces, the acoustic feedback signal in the microphone output. The same arrangement is used to cancel electrical echoes by applying the incoming electrical signal generated by the far end microphone to a second adaptive filter whose output is subtracted from the signal going out to the near end loudspeaker. A typical adaptive filter used in speakerphone systems is called an adaptive finite-impulse-response (AFIR) filter which is comprised of a tapped delay line that generates signals that are selectively combined to generate the filter output. Adaptive FIR filters may be conveniently implemented using digital signal processor (DSP) integrated circuits or "chips".
FIG. 2 shows the digital section circuitry in an illustrative speakerphone employing DSP chips corresponding to a portion of speakerphone circuitry 106 in FIG. 1. Two digital AFIR filters are used in the digital section to cancel both acoustic and electrical echoes. In the illustrated circuit, AFIR filter 212 is used for correcting or cancelling acoustic echoes and AFIR filter 216 is used for correcting or cancelling electrical echoes. The microphone input 200 is provided to an analog-to-digital (A/D) converter 202 where it is sampled at the Nyquist rate (twice the highest signal frequency) to generate a plurality of digital samples over time. These samples are provided to a summer 204 where the correction signal output 213 of the AFIR filter 212 is subtracted from the output 203 of A/D 202 to generate the echo-corrected signal. This latter signal is provided over bus 206 to digital-to-analog (D/A) converter 210 and reconverted back to analog form for transmission over the telephone lines attached to line output 208. An error signal, derived from the corrected signal, is also fed back to the AFIR filter 212, via bus 214, in order to adaptively adjust the filter coefficients in a known manner to cause the filter to adapt to changes in the echo generating mechanisms as discussed above.
In a similar manner, signals received at the line receive input 230 are provided to an A/D converter 228 for conversion into digital form. The digital signals are, in turn, provided to a summer 226 which subtracts the correction signals 217 generated by electrical echo corrector filter 216 from the output of A/D 228. The digital output samples to drive the loudspeaker output are converted to analog signals using D/A converter 222. The corrected signal is also fed back to AFIR filter 216 via bus 218 in order to adaptively adjust the filter coefficients as explained above for AFIR filter 212.
More specifically, a typical finite-impulse-response (FIR) filter is a linear filter, preferably in the form of a tapped delay line. Each tap has associated with it a "weight" which modifies the characteristic of the filter. If the delay line has only feed-forward delays, its transfer function can be expressed as a single polynomial in Z.sup.-1 and the filter's impulse response is limited to a finite number of points; therefore, it has a finite impulse response.
An adaptive finite impulse response (AFIR) filter is a FIR filter with provision for automatic adjustment of its tap weights. FIG. 3 shows an AFIR filter wherein a least-mean-squared (LMS) algorithm adapts the tap weights of the filter used in this example as an echo canceller. A plurality of received digital signal samples, A(t), which are typically samples derived from an analog signal sampled at, or above, the Nyquist rate, is introduced to the filter at the input 310. This signal is used as a reference signal to develop an estimate of the echo to be canceled and the signal is applied to successive delay units 300, 302, etc., until the "end" of the filter at delay unit 308 is reached.
Each of the delays 300, 302, 304, 306, 308 and any additional (but not illustrated) delays, produces a delayed version of the signal and then passes that signal to the next delay unit. The delayed versions of the signal (illustrated as signals A.sub.1, A.sub.2, . . . , A.sub.N) are also fed to multipliers 312, 314, 316, 318, through 320 where they are multiplied by associated tap weights C.sub.1, C.sub.2, . . . , C.sub.N. The multiplier outputs are fed into summer 322 where they are added to produce, in the case of an echo cancellation application, the filter output which is an estimate S(t) of the echo.
The echo estimate, S(t), is subtracted in the summing block or adder 330 from the signal R(t), which is the desired (echo-free) signal corrupted by the echo. The output of adder 330, E(t), is used as an estimate of the desired signal 326 and comprises the actual desired signal plus whatever residual error exists between the estimate of the echo S(t), and the actual echo signal.
The desired signal estimate E(t) is also used to modify the tap weights, that is, adapt the FIR filter, generally by means of an algorithm as shown at 328. The particular algorithm illustrated in box 328 is a least mean squares (LMS) algorithm which computes the tap weight for a given sample C.sub.n (t+1) using the tap weight used with the previous sample C.sub.n (t) plus a correction factor which consists of the product of a convergence constant (.beta.), the delayed signal from the previous sample (A.sub.n (t)) and the estimated signal from the previous sample time (E(t)). This adaptation reduces the signal error attributable to echoes and adjusts to changing conditions which modify the echo characteristics.
FIG. 4 shows an illustrative flow chart for operating an AFIR filter such as that shown in FIG. 3. The process begins at step 400 and proceeds to step 402 where the next sample, corresponding to a sampled version of a reference signal 310 of FIG. 3, is retrieved. Then an echo estimate is generated at step 404 using an AFIR filter. An input sample, corresponding to the desired output sample plus an echo contribution is then retrieved at step 406. The output estimate is calculated at step 408 and is used at step 410 to update the tap weights (C.sub.n) associated with the AFIR filter. At step 412, the echo-canceled signal, E(t), is output. At this point, the routine is complete, as illustrated by step 414.
The invention disclosed herein implements a full duplex speakerphone using a form of adaptive filtering similar to application Ser. No. 08/190,775, entitled ECHO CANCELLATION APPARATUS, now U.S. Pat. No. 5,473,686 issued Dec. 5, 1995 and assigned to Assignee of the present invention and issued on Dec. 5, 1995 as U.S. Pat. No. 5,473,686. That application disclosed a technique for adapting the length of a tapped delay line to particular conditions which affect the amount of delay necessary to provide optimum echo cancellation in a full duplex speakerphone. The present invention employs tapped delay lines having a fixed number of taps which use techniques for controlling signal levels to achieve effective echo cancellation.
AFIR filters can be used as illustrated in FIGS. 2 and 3 to reduce both acoustic and electrical echoes in speakerphones. However, in order to provide these benefits in a low-cost full duplex speakerphone, careful attention to the architecture of the circuitry, several inherent characteristics of full duplex speakerphones, and drawbacks of the prior art is necessary.
In the prior art, each AFIR filter is typically implemented using an individual DSP or several DSP chips connected in cascade. In addition, a control DSP or microprocessor is required to perform the adaptive coefficient computations and control operations of the entire circuit. As many as three to five DSP chips may be required, making this technology available in products to consumers for home or small business use only at relatively high cost.
Full duplex speakerphones enable users at both ends of the communications channel to talk at the same time. This situation, known as doubletalk, can create ambiguous conditions in the AFIR due to the uncorrelated signal components which may exist at any particular time among the two uncorrelated signals. AFIR filter systems may converge on the wrong signal and fail to suppress the target echo or it may not succeed in converging at all, leading to undesirable effects such as howling or squealing. Prior art solutions employ double-talk detectors to detect the condition and suspend the computation of adaptive coefficients while double talk occurs. One effect of suspending AFIR filter coefficient computations is that sufficient echo cancellation occurs for only one condition and compromises all others, resulting in perceptible echo variations that are distracting to the user. This is especially significant for an acoustic echo cancellation filter which must operate over a wide range of reflective conditions in the acoustic environment.
The limited dynamic range or signal-to-noise ratio (S/N) of a telephone product and its typical environment, approximately 30 dB, poses few signal-handling problems to the designer. However, excessive signal levels can cause poor audible performance from a DSP-based system. The 30 dB dynamic range requires that the echoes be suppressed by the AFIR filters to the level of the background noise set by system parameters, level of distortion products, etc. If the maximum signal level is not constrained, the echoes suppressed by 30 dB will be above the noise level and thus audible. This problem can become obtrusive if left uncorrected and must be solved by inexpensive means in a low-cost consumer product.
A further problem that can arise when signal levels become excessive is caused by severe distortion that results when the A/D converters attempt to encode excessive incoming signals driven into the clipping region of the amplifiers by loud voices or other loud sounds during telephone conversations. Such peak amplitude clipping generates distortion products in the form of undesirable harmonic spectral components. More importantly, a clipped signal is seen by the AFIR filter as a smaller peak amplitude signal than is really present. The resulting echo suppression, being disproportionately small, allows some echo components to pass through the system without suppression. The result is ringing or howling sounds which are extremely disrupting to communications.