In a typical telephone network, two types of echoes may be present: acoustical echo and electrical echo. Acoustical echo may occur in a telecommunication network when a hands-free telephone terminal is used. The speech signal generated from the speaker in the terminal propagates in the form of an acoustic wave through an acoustic environment (air) and part of it is reflected back into the microphone of the terminal. This reflected signal will be transmitted back to the talker thereby creating an echo. In some instances, an acoustical echo may occur in a telephone with poor voice coupling between the earphone and the microphone. Therefore, two different components may make up the acoustical echo. The first is the undesired remote speech reflected from the roof, windows and walls, and the second is the direct coupling between the loudspeaker and the microphone. The echo from the first component could be delayed as long as 200 milliseconds.
An electrical echo results from the presence of a hybrid converter that is required to connect a unidirectional four wire link from a public switched telephone network (PSTN) to a local two wire loop. The basic function of the hybrid converter is to separate the transmitted signal originating in the local loop from the received signal in the PSTN section, and vice versa. This process requires the energy of the received signal to pass filly in the local loop. However, due to an impedance mismatch in the hybrid converter, part of the received energy is reflected back to the transmitting port. As a result, a talker hears his own delayed speech which, of course, is undesirable.
One approach for echo reduction in communication networks is to use echo suppressors. A typical echo suppressor acts like a switch that monitors the voice signals traveling in both directions. It detects which person is talking and blocks the signal traveling in the opposite direction. The drawback of such an echo suppressor is that the echo suppressor tends to “chop” speech signals when the users talk back and forth quickly due to the response time for monitoring the speech activities. Moreover, during double talk, i.e., when the users talk simultaneously, the suppressor fails to control the echo.
One proposed solution to avoid the problems of echo suppressors is to provide circuitry or an algorithm that, instead of blocking speech signals in one direction in the communication link, cancels the echo by using an adaptive filter. An adaptive filter is a computational device that attempts to model the relationship between two signals in real time in an interactive manner. Adaptive filters are well accepted in communication systems, for echo cancellation and line equalization. The adaptive filter is based on convolution. The most frequently used structure of an adaptive filter is the finite-impulse-response (FIR) filter.
An adaptive filter can be implemented as an open-loop filter or a closed-loop filter. In a closed-loop filter, an algorithm operates in an iterative manner and updates the adjustable parameters with the arrival of new data and current-signal performance feedback parameters. During each iteration, the system learns more about the characteristics of the input signal. The processor makes adjustments for the current set of parameters based on the latest system performance, i.e., the error signal e(n). The optimum set of values of the adjustable parameters is thus approached sequentially.
FIG. 1 shows a block diagram of a conventional (prior art) adaptive echo cancelling device designated 210. One input of the adaptive echo cancelling device 210 is a far-end input signal x(n) 20. In this figure, the far-end input signal x(n) 20 may be from a far-end terminal, such as a telephone, cell phone, Voice over IP phone or the like. The far-end input signal x(n) 20 is the discrete-time signal used to drive a loudspeaker in a hands-free near-end terminal (not shown).
Another input of the adaptive echo cancelling device 210 is a near-end input signal d(n) 26. The near-end input signal d(n) 26 is the signal picked up by the microphone (not shown) of the hands-free near-end terminal. The near-end input signal d(n) 26 contains a portion of the far-end input signal x(n) 20 in the form of an echo, background noise, and possibly, local speech.
The output of the adaptive echo cancelling device 210 is the output/error signal e(n) 28 which is output to the far-end. The adaptive echo cancelling device 210 may include a loss controller, a non-linear processor, a supplementary howling control device or the like (not shown) to further process the output/error signal e(n) 28 output to the far-end.
The adaptive echo cancelling device 210 includes an adaptive FIR filter 211. The adaptive FIR filter 211 includes a main FIR component 230, an adder 232 and an update step-size control 234. The adaptive FIR filter 211 also includes inputs for receiving the far-end input signal x(n) 20 and the near-end input signal d(n) 26. The adaptive FIR filter 211 outputs the output/error signal e(n) 28 which may be output directly to the far-end or further processed by one of the components mentioned above.
The main FIR component 230 uses the far-end input signal x(n) 20 as a reference signal. The main FIR component 230 outputs an estimated echo signal y(n) 236. As mentioned above, the main FIR component 230 is based on convolution.
The main FIR component 230 also includes multiple delay units denoted by Z−1 in the figure. The far-end input signal x(n) 20 is coupled to an input of a first delay unit. An output of the first delay unit is coupled to an input of a second delay unit. An output of the second delay unit is coupled to an input of a subsequent delay unit. An output of the subsequent delay unit is coupled to an input of another subsequent delay unit (not shown). A last delay unit receives the output of the previous delay unit in the series as its input. The number of delay units depends on the number of taps in the adaptive FIR filter 211. The number of delay units is the number of taps minus one.
The main FIR component 230 also includes multiple multipliers denoted by a circle containing the symbol for a tap coefficient (e.g., Ax). The far-end input signal x(n) 20 is also coupled to an input of a first multiplier. The output of the first delay unit is also coupled to an input of a second multiplier. The output of the second delay unit is also coupled to an input of a subsequent multiplier. The output of each subsequent delay unit is also coupled to an input of each subsequent multiplier (not shown), respectively. The output of the last delay unit is coupled to an input of the last multiplier. The number of multipliers also depends on the number of taps. The number of taps equals the number of multipliers. Each multiplier has a second input. The respective component of an updated tap coefficient vector A(k), as further described below, is coupled to the second input on each respective multiplier.
As mentioned above, the main FIR component 230 includes multiple taps. The computing step enclosed in a dash-line block 238 (hereinafter “the filter tap 238”) is an example of a tap. One input of the filter tap 238 is the far-end input signal x(n) 20. In this example, the delayed output of the far-end input signal x(n) 20 from the previous delay unit in the series of delay units is the input to the last delay unit. Another input of the filter tap 238 is the updated tap coefficient AN output by the update step-size control 234. Specifically, the updated tap coefficient AN is input into an input of the last multiplier. The output of the last delay unit is fed into the other input of the last multiplier. The last multiplier multiplies the output of the last delay unit by the updated tap coefficient AN. The output of the last multiplier is the tap output. Thus, the filter tap 238 includes both a step of convolution and a step of coefficient adaptation.
The outputs of the taps are coupled to multiple inputs of an adder 240. The output of the adder 240 is the estimated echo signal y(n) 236. The output of the adder 240 (i.e., estimated echo signal y(n) 236) is coupled to a negative input of the adder 232. The near-end input signal d(n) 26 is coupled to a positive input of the adder 232.
The adder 232 compares the near-end input signal d(n) 26 to the estimated echo signal y(n) 236 and outputs the output/error signal e(n) 28. Thus, the output/error signal e(n) 28 is the difference between the near-end input signal d(n) 26 and the estimated echo signal y(n) 236 of the main FIR component 230. The output/error signal e(n) 28 is output by the adaptive FIR filter 211 to the far-end.
The output/error signal e(n) 28 is also fed back to the main FIR component 230 via the update step-size control 234. The update step-size control 234 includes a multiplier 242 and an adaptive coefficient algorithm 244. The output/error signal e(n) 28 is input into an input of the multiplier 242. The output of the multiplier 242 is input into an input of the adaptive coefficient algorithm 244. The output of the adaptive coefficient algorithm 244 is the updated tap coefficient vector A(k). Thus, the update step-size control 234 outputs the updated tap coefficient vector defined as A(k)=[A0, A1, A2 . . . AN].
The multiplier 242 includes a step size factor represented by μ. The step size μ is usually a small positive constant. It should be understood by those skilled in the art that a variable step size μ could be used. There may be some situations when the adaptive coefficient updates of the taps need to be stopped. For example, the adaptive coefficient updates of the taps may need to be stopped when a local speech signal is present. In this case, the step size μ may be mathematically reflected as being set to 0. This has the effect of temporarily disabling the adaptive function.
Thus, in the conventional adaptive FIR filter 211 x(n), d(n) and e(n) denote the far-end input signal x(n) 20, the near-end input signal d(n) 26 and the output/error signal e(n) 28, respectively. The adaptive FIR filter 211 is excited by the far-end input signal x(n) 20 and driven by an adaptive algorithm (e.g., a normalized least mean square algorithm (NLMS or LMS)) to produce the estimated echo signal y(n) 236 or replica of the echo signal. The error signal e(n) 28 is then obtained by subtracting this estimated echo signal y(n) 236 from the near-end input signal d(n) 26 and can be expressed as follows:       e    ⁡          (      n      )        =            d      ⁡              (        n        )              -          [                        ∑                      k            =            0                                N            -            1                          ⁢                                   ⁢                              A            ⁡                          (              k              )                                ⁢                      x            ⁡                          (                              n                -                k                            )                                          ]      
and the adaptive algorithm (i.e., when using the LMS algorithm) of the tap coefficient vector update equation can be expressed as follows:Anew(k)=Aold(k)+μe(n)×(n−k), k=0, . . . ,N                where A(k) denotes the coefficient vector for the taps and μ is the step size. It is understood by those skilled in the art that the convergence factor is denoted by μe(n) (i.e., the step size μ multiplied by the output/error signal e(n)).        
When the output signal e(n) 28 is not close to the near-end input signal d(n) 26, the adaptation algorithm will be executed to correct or update the tap coefficients so that the estimated echo signal y(n) 236 will gradually approach the near-end input signal d(n) 26 (i.e., the desired signal). The near-end input signal d(n) 26 is unknown and changes all the time. Therefore, the adaptive FIR filter 211 has to be a real time closed loop feedback system adapting all the time to follow the definition of the near-end input signal d(n) 26.
In a high quality adaptive filter, the coefficient set is adapted all the time and therefore costs a lot in terms of computing power. Thus, an adaptive filter becomes expensive in terms of computing power.
As mentioned above, one example of the adaptive algorithm is the LMS algorithm. The LMS algorithm is the most popular adaptation algorithm, however, other adaptive algorithms may be used. Additionally, the LMS algorithm may make use of the steepest descent approach. The LMS algorithm derives the estimation of the gradient vector based on a limited number of data samples.
Further, the adaptation algorithm may include convergence control in addition to coefficient adaptation. Convergence control is not performed in every tap in order to reduce the cost of the computing power. On the other hand, coefficient adaptation is usually performed on all taps during each sample for a high performance adaptive filter. Thus, most of the computing power of the adaptive filter is consumed when performing the coefficient adaptation.
Early echo cancellation implementations were based on analog circuit technique. However, analog technique was unable to follow adequately the changes in the room environment. Therefore, echo cancellation is now typically done using digital technique. A digital echo canceller is an adaptive FIR filter with long tap size.
The number of taps in a long tap adaptive FIR filter may be more than 3200, for example, for a 200 ms echo canceller used in a 16 kHz sampling-rate ISDN telephone system. This means that at least 7200 Multiply Accumulate (MAC) operations are required in every sample. This includes 3200 MAC operations for the convolution and 3200 MAC operations for the coefficient adaptation. This is equivalent to 102.4 Million Instructions Per second (MIPs). Including other associated computing and control operations, the total number of MIPs could be more than 110 MIPs. One way to decrease the number of MIPs is to skip part of the adaptation computing, which yields relatively low adaptation quality.
In typical digital acoustic echo cancellation, a long tap adaptive FIR filter is used to simulate the echo environment in order to subtract the echo from the near-end input signal, as described above. Usually the taps of the tail component have rather low envelope amplitude compared with the taps of the header component since the echo energy will attenuate with distance. In a finite precision (i.e., fix-point) implementation, the coefficient adaptation of the taps of the tail component become inefficient since the precision of the coefficients is too low. Although double precision algorithms can be adopted, these algorithms will significantly increase the computational complexity.
For example, in a low cost implementation of acoustic echo cancellation, the voice signal and the FIR taps are represented by 16 bit fix point data. If the taps of the tail component contain only 3-4 bits of effective data, the updating or adaptation of the taps of the tail component is very inefficient. If double precision is used to store the taps, the MIPs of the tap adaptation will be doubled. Further, the higher bits in the memory of the taps of the tail component are wasted.
Therefore, there exists a strong need in the art for an improved adaptive echo cancelling device, particularly well suited for use during a communication session involving at least one hands-free telephone terminal.