A method of reducing echo and/or noise signals in telecommunications systems for transmitting useful acoustic signals, particularly human speech, comprising determining by silence detection when the mixture of useful signals and interference signals contains a speech signal or when a silence interval is present, and varying, by means of a two-input multiplier, the amplitude of the useful signals, which are generally disturbed by echo and/or noise signals, in response to a time-dependent control signal a0(t) or a control signal a0(k) clocked at a sampling rate fT=1/T, where k ∈denotes the number of samples, and T denotes the period from one sample to the next.
Such a method is known, for example from DE 42 29 912 A1.
During natural communication between people, as a rule the amplitude of the spoken word is automatically adapted to the acoustic environment.
However in remote spoken communication the speaking partners are not in the same acoustic environment, so neither is aware of the acoustical situation at the location of the other. The problem occurs particularly acutely when one of the partners is compelled by his acoustic surroundings to speak very loudly, while the other partner is in a quiet acoustic environment and is producing speech signals of lower amplitude.
A further problem is that on a TK channel some noise of “electronic origin” is produced and this is co-transmitted as a background to the useful signal. Furthermore, it is also advantageous to attenuate or completely suppress distorting signals such as undesired background noise (noise from the street, the factory, the office, the canteen, aircraft noise, etc.). To enhance comfort while telephoning, it is generally attempted to keep every type of noise as low as possible.
Finally, in TK communications there also occur so-called echoes, which are present in two-wire TK networks as line echoes and can for example appear in simple and less comfortable TK terminals in the form of acoustical echoes.
In general therefore, in the transmission of a mixture of speech signals and distorting signals, it is important to reduce the amplitude of distorting signals such as noise and echoes as much as possible.
A known method for noise reduction is the so-called “spectral subtraction”, as described for example in the publication “A new approach to noise reduction based on auditory masking effects” by S. Gustafsson and P. Jax, ITG Technical Conference, Dresden, 1998. This involves a spectral noise-reduction method in which an acoustic masking threshold (for example according to the MPEG Standard) is taken into account. The disadvantages of such methods are that determination of the said acoustic masking threshold is an elaborate process and that carrying out all the operations associated with the method entails considerable computational effort.
In spectral subtraction the noise in speech pauses is first measured and stored continuously in a memory in the form of a power density spectrum. The power density spectrum is obtained via a Fourier transformation. When speech occurs, the stored noise spectrum is subtracted as a “best current estimated value” from the actual distorted speech spectrum and then back-transformed in the same time area, so that in this way a noise reduction for the distorted signal is obtained.
A further disadvantage of spectral subtraction is that by virtue of the process of noise estimation and subsequent subtraction which are inexact in principle, defects occur in the output signal which are noticeable as “musical tones”. In addition, this known method is hardly appropriate for the suppression of echo signals in TK communication links.
In the extended spectral signal processing also described in the reference cited above, with the help of spectral subtraction the power density spectra for the noise and for the speech itself are first estimated. From a knowledge of these part-spectra, with the help for example of the rules of the MPEG Standard, a spectral acoustic masking threshold RT(f) for the human ear is then calculated. With the help of this masking threshold and the estimated spectra for noise and speech, a simple rule is then applied to compute a filter pass curve H(f) which is designed such that essential spectral portions of the speech are let through as unchanged as possible, while spectral portions of the noise are attenuated as much as possible.
The original distorted speech signal then need only be passed through this filter to obtain a noise reduction for the distorted signal. The advantage of the method is now that “nothing is added to or subtracted from” the distorted signal, so estimation errors have little perceptible effect or hardly any at all. The disadvantages are again the considerable computational effort for spectral noise suppression and the need for upstream connection of an adaptive filter for echo suppression.
In the known compander method, as described for example in the patent DE42 29 912 A1 cited earlier, the degree of noise and echo attenuation is established in accordance with a fixed predetermined transfer function which, among other things, effects a level reduction even in the case of very small input signals.
The compander first has the property of transmitting speech signals with a given (previously set) “normal speech signal level” (sometimes called the normal loudness) virtually unchanged from its input to the output.
If, now, the input signal is ever too loud, for example because a speaker comes too close to his microphone, a dynamic compressor limits the output level to almost the same value as in the normal case, in that the actual amplification in the compander is linearly reduced as the input signal becomes louder. Thanks to this property, the speech at the output of the compander system remains at approximately equal loudness regardless of how marked is the fluctuation of the input loudness.
On the other hand, if a signal with a level lower than normal is fed to the input of the compander, the signal is additionally damped in that the amplification is cut back so as to transmit background noise only in attenuated form so far as possible.
Thus, the compander consists of a compressor for speech signal levels higher than or equal to a normal level, and an expander for signal levels lower than the normal level. In this, the amplification reduction in the expander is more marked the lower is the input level.
A disadvantage of the compander solution is the considerable computational effort required to carry out the known process. Besides, the compression of the speech signal level on the one hand and its expansion on the other hand give rise to a modulation in the loudness of the speech, which changes the speech signal in such a way that the result is often perceived subjectively as unsatisfactory, i.e. it creates an unsatisfactory auditory impression.