In a digital voice communication system, speech is transferred between terminals of a speaking party and a listening party over a communication link. In this description, mobile terminals are communicating with each other in a mobile communicating system. However, the terminals can be any type of terminals for digitally transferred speech communication with a microphone and a loudspeaker. Furthermore, the voice communication system can be any type of system where two or more parties are communicating their speech, by transferring digitally encoded voice signals, e.g. a wireless or non-wireless telephony system, a conference telephony system, an entry phone system in a building, walkie-talkies, etc.
With reference to FIG. 1a, two persons 100 and 108 speaking with each other, use terminals 102 and 106, respectively, transmitting speech signals digitally across a communication link 104. Each terminal has a microphone and a loudspeaker. When person 100 is speaking in the microphone of terminal 102, his or her speech is transferred across the communication link 104, and is emitted from the loudspeaker of terminal 106. The speech emitted from the loudspeaker is picked up by the microphone of terminal 106, together with the speech of person 108, resulting in a combined signal to the opposite terminal 102. This combined signal is transferred across the communication link 104 to terminal 102, where person 100 hears it in his or her loudspeaker.
In this process, the speech picked up by the microphone of terminal 102 is converted into digital signals at terminal 102, and is then converted back to analogue signals at terminal 106, before being emitted as speech by the loudspeaker of terminal 106. Correspondingly, the speech picked up by the microphone of terminal 106 is converted into digital signals at terminal 106, which are converted back to analogue signals at terminal 102, before being emitted as speech by the loudspeaker of terminal 102.
During the transfer from the microphone of terminal 102 to the loudspeaker of terminal 106, the speech is delayed, due to digital processing of the signals in both terminals 102 and 106, and in any intermediate routers, gateways, etc. present at the communication link 104, as well as due to propagation path delay. Such processes are e.g. analogue-to-digital or digital-to-analogue conversion (A/D and D/A, respectively), speech coding, speech buffering, or any other digital processes associated with speech communication. The reproduced voice signals from the loudspeaker of terminal 102 are delayed because of the above described conversion, processing, and propagation. FIG. 1b shows how the total delay DTotal of speech entering the microphone of terminal 102 until emitted by the loudspeaker of terminal 102 basically includes the following six parts also indicated in FIG. 1a: D1, arising from the A/D conversion and signal processing at terminal 102; D2, arising from the signal processing and propagation path delay on the communication link 104; D3, arising from the D/A conversion and signal processing at terminal 106; D4, arising from the speech propagation from the loudspeaker of terminal 106 to the microphone of terminal 106; D5, arising from the A/D conversion and signal processing at terminal 106; D6, arising from the signal processing and the propagation path delay on the communication link 104; and D7, arising from the D/A conversion and signal processing at terminal 102. As a result, person 100, using terminal 102, hears his or her own speech as a delayed echo, which is naturally perceived as disturbing. Typically, the delay DTotal of the perceived echo is in the range of 30-500 ms.
For reducing echoes in a digital voice communication system two methods are generally applied today, referred to as echo suppression and linear filtering based echo cancellation.
Echo Suppressors
An echo suppressor is typically used in mobile terminals such as terminals 102 and 106, temporarily blocking the frequencies of the output signals to the communication link 104 when echoes are detected to be present. A common echo suppressor is the Non-Linear Processor, NLP. To determine when echoes are present and for which frequencies, the NLP in terminal 106 receives information about the frequencies of the incoming signals on the communication link 104. It also receives information about the frequencies of the microphone signals of terminal 106. A terminal (e.g. 102 and 106) determines whether it currently the speech sending or the speech receiving party. If a terminal 106 determines that it is the speech receiving party, it temporarily blocks frequencies, being the same in the microphone signals of the terminal 106 and in the incoming signals on the communication link 104, from being transmitted on the communication link 104 to the opposite terminal 102. Similarly, if a terminal 102 determines that it is the receiver it temporarily blocks frequencies, being the same in the microphone signals of terminal 102 and the incoming signals on the communication link 104, from being transmitted on the communication link 104 to the opposite terminal 106.
Alternatively, the NLP in the terminals can be designed to detect when echo is dominating the speech of the user of the terminal, and block these frequencies.
An advantage of the echo suppressor is that almost no echo remains for the blocked frequencies. However, there are some drawbacks: Frequencies in the background sound are temporarily blocked, resulting in loss of naturalness. In double-talk, when two persons are speaking simultaneously, frequencies in the speech of one party are temporarily blocked, which also affects the emitted speech of the other party.
Linear Filtering Based Echo Cancellers
Typically, a linear filtering based echo canceller is also used in mobile terminals such as terminals 102 and 106. Thus, the linear filtering based echo canceller in terminal 106 estimates the part of the microphone signal of terminal 106, arising from the speech of person 100, which is emitted by the loudspeaker of terminal 106, and subtracts it from the signals to be transmitted back to terminal 102. To estimate the echo of the speech, a digital filter is used. An advantage of linear filtering based echo cancellers is that they preserve the naturalness of speech, because they do not block frequencies from the background sound or the other speaking party. However, linear filtering based echo cancellers also have some drawbacks: They may leave a noticeable remaining echo due to that the digital filter does not completely model the echo, and require a great amount of processing capacity to achieve a sufficient echo reduction.
For providing an estimate of the required processing capacity, the following example is given:
The calculation rate C is the product of a sampling frequency f and a filter length l, i.e. C=f×l, where the filter length l is the product of the sampling frequency f and echo delay t, l=f×t. In other words C=f2×t. For an echo t of 100 ms and a sampling frequency f of 8 kHz, the calculation rate C is 80002 Hz×0.1 s=6.4 million operations per second (MOPS). For 16 kHz, 25.6 MOPS will be necessary. For 32 kHz, 102.4 MOPS, and for 48 kHz, 230.4 MOPS are required.
Combinations
Echo suppressors and linear filtering based echo cancellers are thus both associated with some drawbacks, therefore combinations using both techniques are mostly used, designed to combine the advantages and avoid the drawbacks. Hereinafter, the term “echo canceller” refers to any combination of linear filtering based echo cancellers with or without echo suppressors.
With reference to FIG. 2, an example of a typical design of an echo canceller will now be briefly described. The echo canceller is placed in a terminal 200 used by a person 210 during a voice call with an opposite terminal (not shown). The incoming signals on the communication link 202 are converted to analogue voice signals by the D/A-converter 204, which are emitted by the loudspeaker 206 of terminal 200. The microphone 212 of terminal 200 thus receives both the speech of person 210 and the speech from the opposite terminal, emitted from the loudspeaker 206 and affected by the environment 208 of terminal 200. An A/D converter 214 takes as input the signals from the microphone 212, and outputs digital signals, representing the microphone signals, to a subtracter 218.
The echo canceller also includes an adaptive digital filter 216, which receives the incoming signals from the communication link 202 from the opposite terminal, and the filter 216 also receives the output signals from the subtracter 218, and produces output signals. The output signals from the echo canceller are fed as input signals to the subtracter 218. The output signals from the filter 216 are then subtracted from the A/D-converted signal at the subtracter 218. Thus, the output signals from the subtracter 218 represent the difference between the A/D-converted microphone signals and the output signals from the filter 216. The filter coefficients of filter 216 are determined dynamically (adapted) based on the A/D-converted microphone signals of terminal 200, i.e. the filter coefficients are updated continuously. The continuous update is needed because the environment 208 of terminal 200 changes. The filter 216 processes the signals received on the communication link 202. Thus, the output signals from the filter 216 represent an estimate of the part of the output signals from the A/D-converter 214 which originates from the loudspeaker 206 of terminal 200. These estimated signals will be subtracted, by the subtracter 218, from the output signals from the A/D-converter 214 to achieve an adequate echo cancellation. The output signals from the subtracter 218 are finally processed by a Non-Linear Processor (NLP) 220 to suppress any remaining echoes, before being transferred to the opposite terminal (not shown).
With reference to FIG. 3, another known design of an echo canceller will be described. This echo canceller is placed in a terminal 300 operated by a person 310. Basically, the signals on the incoming communication link 302 are divided into a plurality of frequency bands which are processed individually before being combined together into a composite signal.
Both the signals on the communication link 302 and the signals from the microphone 312 are divided into a plurality of frequency bands 1,2, . . . , N by a number of filters 304 a,b, . . . , n and 314 a,b, . . . , n, respectively. Before the signals on the communication link 302 are emitted by the loudspeaker, they are converted by a D/A-converter (not shown) to analogue speech signals, and after the emitted sounds have been received by the microphone 312 they are converted by an A/D-converter (not shown). The speech emitted by the loudspeaker 306 is heard by person 310, and is also being picked up, affected by the environment 308 of terminal 300, by the microphone 312.
For each band filtered by the respective filter pairs 304a/314a, 304b/314b, . . . , 304n/314n an echo control unit 316 performs echo cancellation, as described above, on the respective band. Each frequency band is then filtered by a respective filter 318 a,b, . . . , n, and echo suppressed by the NLP 320 in the manner described above. Finally, the filtered and echo suppressed frequency bands are combined into a composite signal in the NLP 320, before being transferred to the opposite terminal (not shown).
An advantage of the described echo canceller is that the required processing capacity will be decreased, because the sampling rate can be reduced when the signals to be echo cancelled is divided into separate frequency bands. As the sample rate is reduced, the required processing capacity decreases with the square of the sample rate reduction. The following example will show how the calculation rate C for the above described echo canceller is decreased:
As described above, the formula for the calculation rate is defined as C=f2×l. For a sampling frequency f=20 kHz and an echo delay l=100 ms, the calculation rate C is 200002 Hz×0.1 s=40 MOPS. If the signals are instead divided into 4 separate frequency bands, each having the sampling frequency 5 kHz, the calculation rate for each frequency band C is 50002 Hz×0.1 s=2.5 MOPS. For the 4 frequency bands in total the calculation rate Ctot is then 4×2.5 MOPS=10.0 MOPS. Thus, dividing the signals to be echo cancelled into 4 bands decreases the calculation rate from 40 MOPS to 10 MOPS, i.e. the total calculation rate is decreased by a factor 4.
Another advantage of the echo canceller is that different linear filtering based echo cancellers can be used for the respective bands. If, e.g., most of the echo is present in the lower frequency range, and less in the higher frequency range, then a complex echo canceller, resulting in a small remaining echo, can be used for the lower frequency range, and a less complex one can be used for the higher frequency range. However, a drawback of the described echo canceller is that the process of combining the bands into an acceptable composite signal is relatively complex.
Hence, there are certain problems associated with the existing solutions outlined above. Even with a combination of echo suppressors and linear filtering based echo cancellers it is a problem, considering the limited processing capacity of the terminal, to design an apparatus producing a remaining echo from the linear filtering based echo cancellers that is small enough to be suppressed by the echo suppressors, without losing the naturalness of the resulting speech signals.
Another problem is that in situations with a band-divided echo canceller, the design of the summary function for the bands is very complex. This is due e.g. to the fact that the filter characteristics for the bands are not ideal in practice, and that plural different echo control units are used for the respective bands.