The invention relates to a method for echo cancellation in a digital data transmission system, in which system the end of the transmission link to which sound returns as an echo is the far end and the end of the transmission link from which an echo is reflected back is the near end, and in which a speech coding method is used on the echo path at least for a far-end signal transmitted from the far end to the near end, the method comprising the following steps: estimating the echo originating from the near end with an adaptive linear filter on the basis of the far-end signal and subtracting the echo estimate from the near-end signal transmitted from the near end to the far end.
In bi-directional data transmission networks, such as the telephone network, an echo occurs caused by the reflection of the speaker""s own voice back from certain elements of the data transmission network. The echo is interfering, if there is a delay in the transmission link. A delay is usually caused by a propagation delay or digital processing of a signal.
The echo occurring in data transmission networks can be divided into two types: electric and acoustic echo. An electric echo is generated in transmission systems of transmission and reception directions of a link, such as the hybrid circuits of a telephone network (2-conductor-4-conductor converters). An acoustic echo is generated in a terminal in such a manner that the signal from the incoming transmission direction is acoustically coupled to the microphone of the transmission direction outgoing from an ear piece or loudspeaker.
In this context, the end of the transmission link to which the sound of the speaker""s own voice returns as an echo is referred to as the far end, and the end of the transmission link from which the echo is reflected back is referred to as the near end.
Echo cancellers or echo suppressors are usually used to try to eliminate the echo problem. An echo canceller tries to generate an echo estimate and to cancel the echo by subtracting the echo estimate from the echo path, i.e. from the signal returning from the near end. Generally, echo estimation tries to model the impulse response of the echo path by means of an adaptive filter. In addition, non-linear processors are often used in echo cancellers to cancel the residual echo created as a result from the adaptive filtering.
An echo suppressor is usually based on comparing the power levels of a signal going out to the echo path and returning from it. If the power of the signal returning from the echo path is smaller than a certain ratio as compared to the power of the signal gone out the echo path, the transmission link returning from the echo path is disconnected so as not to let the echo through. Otherwise, the situation is interpreted as near-end speech or double speech, in which case the link can naturally not be disconnected.
Today, mainly echo cancellers are used for echo cancellation, because echo suppressors cause the following problems. Since the comparison ratio of the power levels of the far-end and near-end signals must be selected according to the worst echo situation (generally 6 dB), low level near-end speech will not get through during double speech; and although the average speech levels of the near and far end were equal, the near-end speech is occasionally cut off during double speech depending on the momentary ratio of the signal levels. Another problem is the echo during double speech. During double speech, the near-end speech gets through the echo suppressor as does the far-end echo summed to the near-end speech. The double speech echo can be reduced by attenuating the near-end and possibly also the far-end signal in the echo suppressor during double speech. However, the attenuation cannot be very strong, because it causes an interfering pumping in the speech volume.
The adaptive filters in echo suppressors use linear filters which assume that the signal returning from the echo path is both linear and time invariant (LTI, Linear Time Invariant). If this is not the case, the echo signal can be attenuated with an adaptive filter only to the extent of the linear component in the echo signal. In other words, the attenuation achieved by an adaptive filter is directly proportional to the signal-to-noise ratio of the signal returning from the echo path, i.e. inversely proportional to the non-linearity on the echo path. When the signal-to-noise ratio becomes worse, the residual echo level goes up. A non-linear processor (NLP) is often used to try to cancel this residual echo.
Data transmission networks have several sources of non-linearity. The most typical source of non-linearity in digital data transmission networks is the quantization noise generated in A/D conversion. In uniform quantization, quantization noise is, in principle, constant, whereas the signal-to-noise(distortion) ratio increases while the signal level increases. Thus, attenuation achieved on an echo signal by a linear filter is directly proportional to the momentary signal level.
In companding PCM codecs (ITU-T G.711), an analogous signal is compressed in an encoder according to a non-linear amplification curve (a or xcexc zenith), after which the signal is uniform-quantized. Alternatively, an analogous signal can first be uniform-quantized and then non-linear-quantized according to the a or xcexc zenith. Correspondingly, a compensating expansion of the compression is performed in a decoder. Typical of a companding PCM codec is that the signal-to-noise ratio remains almost constant on a rather wide dynamic range. In G.711 codecs, the signal-to-noise ratio is approximately 35 dB while the signal level (gaussian noise) varies between xe2x88x925 dBm0 and xe2x88x9235 dBm0. However, on low signal levels below xe2x88x9235 dBm0, the signal-to-noise ratio behaves as in uniform quantization: when the signal level decreases, the signal-to-noise ratio decreases. It can thus be noted that at most an approximately 35-dB additional attenuation can be achieved on an echo signal by means of a linear filter. In practice, this attenuation is often smaller, because the level of the echo signal is rather low and thus the attenuation is dependent on the momentary signal level.
The noise summing to the echo signal can also be considered a source of non-linearity. So-called line noise is generated in analogous transmission systems. When using PCM links in digital data transmission systems, noise is not cumulated, as it is in analogous systems, and thus the main noise source is often acoustic background noise picked up by the microphone of the terminal. The attenuation of the echo signal achieved by linear filters decreases, if the line noise of the echo path or the background noise of the near end is louder than the quantization noise of the PCM codec.
A third source of non-linearity is a non-linear distortion generated in the loudspeaker of the near end, which can be considerable in a loudspeaker phone or hands-free phone. In such a case, the signal-to-distortion ratio of the returning acoustic echo has decreased as compared with the signal going out to the echo path, and the attenuation achieved by linear filters decreases correspondingly. International Patent Application PCT/US96/02073 discloses a method for compensating the non-linear distortion generated in the loudspeaker phone by modelling the distortion mechanism generated in the loudspeaker.
However, one of the most significant sources of non-linearity in digital data transmission networks is speech coding. Speech coding is today generally used in the air interface of digital mobile networks (e.g. GSM, US-TDMA, US-CDMA, PDC, TETRA). Similarly, several WLL (Wireless Local Loop) systems use speech coding in the air interface. In addition, the use of speech coding will become more common in circuit-switched PSTN networks (e.g. ITU-T, G.728, G.729, G.723.1). Speech coding will also become more common in packet switched networks (e.g. Internet calls, video conferences). It can also be noted that digital satellite mobile systems use or will use speech coding.
Typically, speech coding causes an at least 10-dB deterioration in the signal-to-distortion ratio as compared with the signal-to-distortion ratio of one G-711 PCM codec. If speech coding is used on the echo path, a double speech-coding is usually performed on the echo signal, because a transmission link typically uses speech coding in both transmission directions. This means that the signal-to-distortion ratio worsens further. It has been noticed in ITU-T G.113 that a non-linear distortion caused by G.728 (LD-CELP) consecutive speech codecs increases according to equation 20log(n), where n is the number of codec pairs. It can be said that the maximum additional attenuation of an echo signal achieved in practice with a linear filter is less than 20 dB depending on the speech coding method and the level of the echo signal.
Earlier, it was noted that current echo cancellation solutions based on adaptive filtering try to lessen the impact of non-linearities on the echo path by using a non-linear processor (NLP) to cancel the residual echo generated in adaptive filtering. The higher the level of residual echo, the higher the threshold of NLP must be set to cancel the residual echo. A disadvantage of the setting the threshold level higher is the higher cutting-off-probability of the near-end speech during double speech. The non-linear acoustic echo generated by a loudspeaker phone can be cancelled with the more aggressive NLP, because a full-duplex property is typically not required in loudspeaker phones.
If the cause of the non-linearity is a loud line noise on the echo path or background noise at the near end, a linear filter can at most attenuate the echo to the extent of the signal-to-noise ratio of the moment. Even though the attenuation achieved by adaptive filtering decreases and thus NLP can switch off and let the residual echo through, it is subjectively not necessarily interfering, because the uncorrelated noise of the echo signal covers the residual echo relatively effectively.
It is, however, difficult to eliminate an increase in the level of residual echo caused by a speech codec or codecs on the echo path with NLP. If the threshold level of NLP is raised, the full-duplex properties of the link suffer, because near-end speech may be cut off during double speech. It is also not possible to utilize line noise or near-end background noise-type subjective cover effect, because the level of the non-linear distortion of the speech codec typically correlates according to the signal amplitude. Thus the distortion signal that has got through NLP sounds subjectively like a distorted echo signal.
Another problem is that speech coding typically causes an additional one-directional delay of over 20 ms. In addition, digital mobile networks use channel coding and interleaving to correct the errors on the radio path. These together cause considerable additional delay in the transmission link. Typically, the one-directional transmission delay in digital mobile networks is over 80 ms. Currently, one should prepare for echo path delays of 60 ms in general in PSTN network echo cancellers. If speech coding is used on the echo path, the adaptive filter should then be at least 100 ms long, or if speech coding and a digital radio interface is used on the echo path, the length of the adaptive filter should be at least 220 ms. These requirements would considerably increase the need for computational capacity and memory in echo cancellers. Additionally, the convergence speed of the filter typically suffers and the residual noise caused by the filter itself increases. One possibility to avoid a longer adaptive filter is to use a shift register. The shift register memory can store samples going out to the echo path to the extent of a bi-directional speech coding delay and/or a radio path delay. In such a case, the length of the filter can be made shorter and chosen to correspond to the expected echo path delay variance. This approach does not, however, remove the need for additional memory for the shift register, which may in some cases be excessive as compared to the benefit achieved, such as a marginal additional echo attenuation, as opposed to a solution based on solely an echo suppressor.
Due to the above-mentioned problems, an echo canceller based on adaptive filter cannot be successively applied, if speech coding is used on the echo path. It has been suggested that an echo suppressor or distributed echo cancellation (an adaptive filter at the near end and NLP after speech coding) could be applied to acoustic echo cancellation, if the echo path has a 4-conductor link. Since the level of an acoustic echo is usually lower that an electric echo, an echo suppressor can be designed in a manner that the full-duplex properties of the link do not suffer much. One such method is disclosed in the patent application PCT/IF197/00390 of the applicant. The problem in such an approach remains that a high-level acoustic echo or an electric echo generated in a 2/4-conductor converter cannot be cancelled without affecting the double speech properties. An additional problem is that during double speech the level of the residual echo remains higher than when using an adaptive filter.
The technically most sensible solution for adaptive echo cancellation, when speech coding is used in the transmission, is to place the echo canceller at the near end. This way, the echo path does not have non-linearities caused by speech coding and an adaptive echo canceller of prior art can be used. However, this is not always possible in practice due to lack of knowledge or for cost reasons, for instance. In some cases, there is a need to centralise the near-end echo cancellers in both transmission directions in one network element to cancel the echo of both the near and the far end. In such a case, it is possible that speech coding is used in the far-end direction and this causes non-linearity on the echo path. In addition, many PC-based Internet telephones do not use an echo canceller even though the transmission delay is typically very long. A third example is acoustic echo cancellers or echo suppressors of digital mobile stations. There is a requirement in the guidelines for a certain attenuation of acoustic echo in mobile stations, but, in practice, there are mobile stations in the market, in which the acoustic echo is subjectively interfering. This may, for instance, be due to shortcomings in the standard approval for mobile stations. Finally, it can be mentioned that some WLL terminals do not have echo cancellation or the level of their echo cancellation is not in compliance with the requirements set for PSTN echo cancellers (ITU-T G.168) and it is, therefore, also necessary to have echo cancellation on the PSTN side of the WLL air interface to cancel the echo from the direction of the terminal. Thus, an apparent need exists for adaptive echo cancellation, when speech coding is used on the echo path.
Thus, it is an object of the invention to develop a method to solve the above-mentioned problems. The objects of the invention are achieved by a method characterized by decoding the speech-coded signal of the far end in an echo canceller and estimating the echo originating from the near end on the basis of said decoded far-end signal.
The invention is based on reducing the impact of a non-linear distortion caused by a speech codec on the echo path in an echo canceller based on an adaptive linear filter by modelling the non-linear distortion with the local decoder.
The method of the invention provides the advantage that the reduction in the distortion level achieved by it reduces the level of the residual echo in direct proportion. The method of the invention provides the further advantage that the signal samples received in the echo canceller are transmitted from the encoder in compressed mode whereby the amount of memory required to store them in the echo canceller is considerably reduced.
According to an alternative embodiment of the invention, the echo estimate produced by the adaptive linear filter is encoded and decoded, before it is subtracted from the near-end signal, to compensate for the non-linear distortion caused by the speech coding performed on the echo path on the near-end signal. With this embodiment, a situation, in which the near-end signal is also speech-coded, and the non-linear distortion caused thereby can be taken into account.
Further, according to another alternative embodiment of the invention, said decoded signal is also fed to a second adaptive linear filter parallel to said adaptive linear filter, the output signal of the second adaptive linear filter is encoded and decoded, the thus obtained second echo estimate is subtracted from the signal to be transmitted from the near end to the far end, and either a near-end signal, from which the first echo estimate is subtracted, or a near-end signal, from which the second echo estimate is subtracted, is selected for transmitting onward to the far end. With this alternative embodiment, the achieved echo cancellation can be optimized in different situations by using two or more different parallel filtering branches.
Another object of the invention is an apparatus for echo cancellation in a digital data transmission system in which system the end of a transmission link to which sound returns as an echo is the far end and the end of the transmission link from which an echo is reflected back is the near end, and in which a speech coding method is used on the echo path at least for a far-end signal transmitted from the far end to the near end, whereby the apparatus comprises an adaptive linear filter with which an echo estimate is produced on the basis of the far-end signal and subtracted from the signal coming from the near end to cancel the echo originating form the near end, whereby the apparatus is characterized in that the apparatus also comprises a speech decoder with which the speech-encoded far-end signal is decoded and then forwarded to an adaptive linear filter for the purpose of producing an echo estimate.
By means of such an apparatus, the advantage of the method of the invention can be achieved in a simple matter.