This invention relates generally to signal processors and echo cancellers. More particularly, the invention relates to a network echo canceller for integrated telecommunications processing.
Single chip digital signal processing devices (DSP) are relatively well known. DSPs generally are distinguished from general purpose microprocessors in that DSPs typically support accelerated arithmetic operations by including a dedicated multiplier and accumulator (MAC) for performing multiplication of digital numbers. The instruction set for a typical DSP device usually includes a MAC instruction for performing multiplication of new operands and addition with a prior accumulated value stored within an accumulator register. A MAC instruction is typically the only instruction provided in prior art digital signal processors where two DSP operations, multiply followed by add, are performed by the execution of one instruction. However, when performing signal processing functions on data it is often desirable to perform other DSP operations in varying combinations.
An area where DSPs may be utilized is in telecommunication systems. One use of DSPs in telecommunication systems is digital filtering. In this case a DSP is typically programmed with instructions to implement some filter function in the digital or time domain. The mathematical algorithm for a typical finite impulse response (FIR) filter may look like the equation Yn=h0X0+h1X1+h2X2+ . . . +hNXN where hn are fixed filter coefficients numbering from 1 to N and Xn are the data samples. The equation Yn may be evaluated by using a software program. However in some applications, it is necessary that the equation be evaluated as fast as possible. One way to do this is to perform the computations using hardware components such as a DSP device programmed to compute the equation Yn. In order to further speed the process, it is desirable to vectorize the equation and distribute the computation amongst multiple DSP arithmetic units such that the final result is obtained more quickly. The multiple DSP arithmetic units operate in parallel to speed the computation process. In this case, the multiplication of terms is spread across the multipliers of the DSPs equally for simultaneous computations of terms. The adding of terms is similarly spread equally across the adders of the DSPs for simultaneous computations. In vectorized processing, the order of processing terms is unimportant since the combination is associative. If the processing order of the terms is altered, it has no effect on the final result expected in a vectorized processing of a function.
One area where finite impulse response filters is applied is in echo cancellation for telephony processing. Echo cancellation is used to cancel echoes over full duplex telephone communication channels. The echo-cancellation process isolates and filters the unwanted signals caused by echoes from the main transmitted signal in a two-way transmission.
Echoes are part of everyday life. Whenever we speak, we hear our own voice transmitted through both the air and our bodies. These echoes have a short latency, arriving at our ears within a tenth of a millisecond. Our minds automatically filter short-latency echoes so we do not notice them. We are so used to hearing these echoes as sidebands that when they are removed artificially, we notice their absence. Therefore, a certain amount of short-latency echo is desirable. However, the long-latency echoes experienced in modern telephony networks are not desirable.
Echoes are common in telephony equipment. They are caused by electrical reflections from nearly any impedance mismatch as well as by acoustical coupling between loud speakers and microphones. These echoes do not cause auditory problems until their delay (or xe2x80x98latencyxe2x80x99) increases to roughly 30 ms or more.
Typically, echoes are not a serious issue in local telephone connections. However, in long-distance telephone connections, echoes become increasingly serious as their latency increases. As a result, a significant amount of signal processing is needed in a telephony-processing subsystem to eliminate the effect of echoes.
With the exception of speaker telephones (which are prone to echoes), most acoustical echoes can be controlled by careful design of the telephone handset. In contrast, electrical echoes are far harder to prevent and are caused by virtually any impedance mismatch in the telephone communication circuit.
Referring now to FIG. 8, a typical prior art telephone communication system is illustrated. A telephone, fax, or data modem couples to a local subscriber loop 802 at one end and another local subscriber loop 802xe2x80x2 at an opposite end. One source of impedance mismatch is from the cable impedance in the local subscriber loop 802. Local subscriber loops 802 vary in length from a few hundred feet to about 25,000 feet, so there is always some mismatch with the constant impedance terminations at a central office.
Each of the local subscriber loops 802 and 802xe2x80x2 couple to 2-wire/4-wire hybrid circuits 804 and 804xe2x80x2. An even greater source of impedance mismatch is caused by 2-wire/4-wire hybrid circuits 804 and 804xe2x80x2. Hybrid circuits 804 and 804xe2x80x2 are composed of resistor networks, capacitors, and ferrite-core transformers. Hybrids circuits 804 and 804xe2x80x2 convert the 4-wire telephone trunk lines 806 (a pair in each direction) running between telephone exchanges of the PSTN 812 to each of the 2-wire local subscriber loops 802 and 802xe2x80x2. The hybrid circuit 804 is intended to direct all the energy from a talker on the 4-wire trunk 806 at a far-end to a listener on a 2-wire local subscriber loop 802 at a near end. Impedance mismatches in the hybrid circuit 804 results in some of the transmitted energy from the far-end being reflected back to the far-end from the near-end as a delayed version of the far-end talker""s speech. As little as a 30 millisecond (msec) round-trip delay in the echo back to the far end is perceptible. Round-trip delays of 50 msec or more are objectionable and should be reduced or eliminated.
Echoes 810xe2x80x2 are formed when a speech signal from a far end talker leaves a far end hybrid 804xe2x80x2 on a pair of the four wires 806xe2x80x2, and arrives at the near end after traversing the PSTN 812, and may be heard by the listener at the near side. A small portion of this signal is reflected by the hybrid 804 at the near end, and returns on a different pair of the four wires 806 to the far end and arrives at the hybrid 804xe2x80x2 delayed by a period of time referred to as the xe2x80x9cecho tail lengthxe2x80x9d. The talker at the far end hears this reflected and delayed small portion of his speech signal as an echo. Echoes can occur at each talking end as each person switches from being a talker to a listener. In traditional telephone networks, an echo canceller is placed at each end of the PSTN in order to reduce and attempt to eliminate this echo.
In general, several things contribute to an echo: (i) energy reflection due to impedance mismatches; (ii) a sufficiently large roundtrip delay between a talker""s transmitted signal and its reflection; and (iii) poor echo attenuation occurring at the hybrid (i.e. low Echo Return Loss). There are two major causes for increased round-trip delay: (I) propagation delays and (II) digital signal processing algorithmic delays. Propagation delays are caused by the circuit length from talker to listener and transit time over satellite links. The digital signal processing (DSP) algorithmic delays are caused by one or more of the following: Conversion delays between analog to digital and digital to analog; signal processing ordinarily performed to enhance signal quality; signal transcoding such as that performed in digital wireless telephony equipment for Code-division multiple access (CDMA), Global system for mobile communications (GSM) and Personal Communications Services (PCS); and packet delays or latency.
With interest in providing telephony over packet networks such as the Internet, another factor is introduced to increase the roundtrip delay which is of great concern. The delays or latency caused by signal processing incurred in packet processing of packets and protocol stack execution. The delay/latency is not necessarily related to distance but due to processing delays. If enough delay/latency is introduced, echoes can be heard even on local telephone calls. The longer delay/latency further magnifies other echo-related communication problems such as double-talk where both far end and near end talk at the same time.
The delay/latency in a packet base network can be attributed to hybrid delay, coder or algorithmic delay, packetization/transmission delay, transit or network delay, surface land-line propagation delay and satellite-link propagation delay. The hybrid delay is the round trip delay between an echo canceller and network hybrids and is typically between 32 to 64 msec. The coder or algorithmic delay is the delay from a signal processing algorithm that uses a certain-size xe2x80x98windowxe2x80x99 to force a delay while waiting for all necessary samples and is typically up to 40-ms long. For example, the G.723.1 coder has an algorithmic delay of approximately 37.5 ms. The packetization/transmission delay is associated with the creation of packets and transmitting the packet through the protocol stacks. The transit or network delay is caused by access line delay (approximately 10-40 msecs) and router/switch delay (approximately 5 mses per router/switch). The surface land-line propagation delay is a delay associated with cabling distances and can be up to approximately 20 msecs from coast to coast of the United States. The satellite link propagation delay is associated with the delay time in high earth-orbit satellites such as geostationary satellites which can add approximately 250 msecs and the delay time associated with low earth-orbit satellites which can add a few milli-seconds of delay each. The delay between when a packet is sent and when it is received has a fixed component which is technology limited (processing and transmission link delay) and a variable component due to queuing and processing of packets, route hops, speed of the backbone, congestion, and so forth. The ITU-T G. 114 committee recommends no more than a 400 ms one-way total delay for voice, and no more than 250 ms for real-time fax transmissions one-way.
Referring now to FIG. 9, a typical prior art digital echo canceller 900 is illustrated. The prior art digital echo canceller 900 couples between the hybrid circuit 804 and the public switched telephone network (PSTN) 902 on the telephone trunk lines. The governing specification for digital echo cancellers is the ITU-T recommendation G.168, Digital network echo cancellers. The following terms from ITU-T document G.168 are used herein and are illustrated in FIG. 9. The end or side of the connection towards the local handset is referred to as the near end, near side or send side 910. The end or side of the connection towards the distant handset is referred to as the far end, far side or receive side 920. The part of the circuit from the near end 910 to the far end 920 is the send path 930. The part of the circuit from the far end to the near end is the receive path 935. The part of the circuit (i.e. copper wire, hybrid) in the local loop 802, between the end system or telephone system 108 and the central-office termination of the hybrid 804 is the end path. Speech signals entering the echo canceller 900 from the near end 910 are the send input Sin. Speech signals entering the echo canceller from the far end 920 are the received input Rin. Speech signals output from the echo canceller 900 to the far end 920 are the send output Sout. Speech signals exiting the echo canceller to the near end 910 are the received output Rout.
If only the far end 920 is talking to generate speech signals, Rin arrives and passes through the echo canceller 900 and forms Rout. Rout enters the local loop 802 via the hybrid 804. Due to impedance mismatches, part of the Rout energy is reflected by the hybrid 804 and becomes the Sin component. Instead of being near side speech, Sin in this case is an undesirable echo of the speech from the far end 920. Sin, being an echo, should be cancelled before being re-transmitted back to the far end 920. The delay in the hybrid between the Rout signal and the respective Sin echo signal is referred to as the echo tail length. All echo cancellation occurs in the send path 930 between Sin and Sout. Signals Sin, Rin, Sout, and Rout are all assumed to be 16b linear values, not companded 8b PCM, or encoded per an ITU-T G.7xx spec.
The typical prior art digital echo canceller 900 includes the basic components of an echo estimator 902, a digital subtractor 904, and a non-linear processor 906. Typically, the echo-cancellation process in the typical prior art digital echo canceller 900 begins by eliminating impedance mismatches. In order to do so, the typical digital echo canceller 900 taps the receive-side input signal (Rin). Rin is processed in the echo estimator 902 to generate an estimate of the echo which is then subtracted from Sin. Rin is also passed through to the near end 910 without change as the Rout signal. The echo estimator 902 is a linear finite impulse response (FIR) convolution filter implemented in a DSP. The estimator 902 accepts successive samples of voice on Rin (typically a 16 bit sample every 125 microseconds). The voice samples are multiplied with a set of filter coefficients approximating the impulse response of circuitry in the endpath to generate an echo estimation. Over time, the set of filter coefficients are changed (i.e. adapted) until they accurately represent the desired impulse response to form an accurate echo estimation. The echo estimation is coupled into the subtractor 904. If the echo estimation is accurate, it is substantially equivalent to the actual echo on Sin.
The subtractor 904 digitally subtracts the echo estimation from the Sin signal. The subtractor 904 generates a difference which is an error between the actual echo value and the echo estimation value. Note that only the actual echo value is present in the Sin signal when the near-end 910 is not generating speech signals (i.e. no one is talking) on Sin. A feedback mechanism between the digital subtractor 904 and the echo estimator 902 uses the error to update the filter coefficients in the echo estimator 902 to cause convergence between values of the echo estimation and the actual echo. Since voice levels can vary, the echo estimation must vary as well. Thus the filter of the echo estimator 902 uses the error feedback in a continuous adaptation process.
If a person at the near end 910 starts talking at the same time as a person at the far end 920 each generating speech signals, the Sin signal includes the actual echo signal and the speech signal of the talker at the near end 910. This condition is known as xe2x80x9cdouble-talkxe2x80x9d which can disrupt the adaptation process if measures are not taken. A detector is used to detect the xe2x80x9cdouble-talkxe2x80x9d condition and inhibits the adaptation process and retains its filter coefficients when both sides are talking at once. While adaptation is inhibited, echoes can still be cancelled using the retained filter coefficients. Once the near end person stops talking and generating speech signals on Sin, adaptation in the echo estimator 902 can continue. If the far end 920 person stops talking stopping the generation of speech signals on Rin, the filter coefficients are retained until the far end 920 person starts talking without the near end 910 and adaptation can continue.
If the signal at Rin was a very sharp, impulsive, explosive sound (mathematically consisting of a very wide frequency spectrum), the impulse response could be immediately known. However because the input is usually speech signals, it takes a period of time for the filter coefficients to adapt and converge to a close approximation of the required transfer function for generating an echo estimation. As a result, it is possible to predict the adaptation delay as well as an Echo Return Loss Enhancement (ERLE). The ERLE of the echo canceller 900 is the echo attenuation provided by it.
The output of the subtractor 904 is coupled into the Sout port via the non-linear processor 906 and fed back to the FIR filter of the echo estimator 902. Control logic (not shown) in the echo canceller 900 receives the output from the subtractor 904 to implement a negative feedback mechanism. Large error signals on the output from the subtractor cause the negative feedback mechanism to make large changes in the filter coefficients to minimize the error signal on the output from the subtractor 904 between the actual echo and the echo estimation. The adaptation process of the filter coefficients to minimize the error signal should only take a few milliseconds. However, even a fully adapted set of filter coefficients represents a linear model of the system and does not correlate with non-linear effects. Non-linear echoes associated with non-linear effects can be significant and will not be cancelled by linear adaptations in filter coefficients. Non-linear echoes can be caused by non-linear effects such as clipped speech signals, speech compression, imperfect PCM conversions (quantization effects), as well as poorly designed speakerphones that allow acoustical echoes to occur on the near-side handset. The non-linear processor (NLP) 906 in the send path 930 is used to remove non-linear echoes in the output signal from the subtractor 904.
The non-linear processor 906 has a variable NLP suppression threshold which adapts to the signal levels on Rin and Sin because speech levels are dynamic. The non-linear processor 906 removes any signal in the output from the subtractor 904 that is below its varying NLP suppression threshold. The NLP suppression threshold is adapted to changing speech levels in order to prevent clipping of speech signals generated in Sin at the near end 910 (its presence being signaled by a xe2x80x98double-talkxe2x80x99 detector). The adaptation rates of echo cancellers influence the dynamics of variations in the NLP suppression threshold. The adaptation rate controls whether or not the first syllable of speech at the near end 910 is clipped or not at the far end 920. Typically, the subtractor 904 can remove no more than 35 dB of echo. Therefore, the NLP is needed to reduce any residual echo including non-linear echoes to inaudible levels at the far end 920.
The typical prior art digital echo canceller has a number of disadvantages. One disadvantage is that it does not provide full telephony processing. Another disadvantage is that the prior art digital echo canceller has not yet been adapted for communicating data over a packet network. Another disadvantage is that it has yet to provide an integrated solution for multiple channels. Yet another disadvantage is that the mechanism of detecting double talk and controlling the adaptation process in response to a double talk condition is inefficient. Another disadvantage is that prior mechanisms for switching non-linear processing ON or OFF have been rather crude and unsophisticated. Yet another disadvantage is that prior adaptation methods and their respective adaptation rates are unrefined in prior echo cancellers.