This application claims priority under 35 U.S.C. xc2xa7xc2xa7119 and/or 365 to 199 35 808.7 filed in Germany on Jul. 29, 1999; the entire content of which is hereby incorporated by reference.
The invention relates to an echo cancellation device for canceling echoes caused by a coupling of a reception signal received by a receiving unit of a transceiver unit of a telecommunication system to a transmitting unit of the transceiver unit. In particular, the invention eliminates echoes which are introduced in the transmitting path of the transceiver unit as a result of an acoustic coupling between a loudspeaker of the receiving unit to a microphone of the transmitting unit.
More specifically, the echo cancellation device is intended to eliminate so-called residual echoes in the transmitting path which remain in the output of a conventional echo canceller after a main echo cancellation has been carried out.
FIG. 1 shows in connection with FIG. 2A a block diagram of a conventional echo canceller EC of a transceiver unit TRU of a telecommunication system TELE. Via an antenna ANT and an antenna switch SW a signal RFExe2x80x2xe2x80x3 is input and processed by a receiving unit RX. A receiver circuitry RCRT and a decoder DECOD contain all the high frequency and low frequency circuits for providing a reception signal RFE to a loudspeaker SP via a D/A-converter and to the echo canceller EC. In the low frequency path of the receiving unit RX the speech decoder DECOD recomposes speech from the information contained in the signal RFExe2x80x2xe2x80x3 (see FIG. 1). This recomposing of speech will be explained with more detail with reference to FIG. 4 which shows a schematic block diagram of the speech decoder DECOD. Hereinafter, the signal RFE received from a far end transceiver unit will also be called the xe2x80x9cfar end signalxe2x80x9d whilst the signal TFE provided by the near end transceiver unit to the far end transceiver unit will be denoted as the xe2x80x9ctransmitted near end signalxe2x80x9d.
As in particular shown schematically in FIG. 2A, the far end signal RFE is emitted from the loudspeaker SP of the transceiver unit TRU and is acoustically coupled to the transmitting unit TR, in particular to the microphone MC thereof. Also other coupling effects are conceivable, i.e. through a parasitic electrical coupling between the receiving and the transmitting units RX, TR. Thus, the far end signal emitted from the loudspeaker SP together with the microphone MC form a closed loop system causing the far end signal RFE to be transmitted back to the far end transceiver unit.
In most telecommunication systems TELE, in particular in a global system for mobile communications (Global System for Mobile Communication GSM), the transmitted signal TNExe2x80x2, TFE will be delayed, such that the user of a far end transceiver unit will perceive this as an echo. In this connection it should be noted that the teachings disclosed herein are not particularly limited to a mobile radio communication system but also apply to other communication systems where two transceiver units transmit and receive speech. Therefore, the radio transmission via an antenna ANT is only one example of such telecommunication systems.
Due to the acoustic and/or electrical coupling effect, a portion of the far end signal will always be present in the transmitting path independently as to whether or not the user of the near end transceiver unit actually speaks into the microphone MC or not. This aspect as to whether speech is present or not will be investigated with more details below.
To eliminate the far end signal being transmitted to the far end transceiver unit, an echo cancellation device EC comprising a transfer function estimator EST, H and a subtractor ADD is used, cf. FIG. 2B. Basically, the transfer function estimator EST, H is adapted to estimate the coupling transfer function H from the receiving unit RC to the transmitting unit TR and for processing the reception signal RFE with said estimated coupling transfer function H. In particular, if the acoustic coupling is considered, the transfer function estimator EST, H estimates the acoustic transfer function from loudspeaker SP to the microphone MC. The filter output signal. RFExe2x80x2 is subtracted by the subtractor ADD from the transmission signal TNE which includes an echo signal due to the acoustic and/or electric coupling of the received signal RFE to the transmitting unit. Ideally, the use of the transfer function estimator and the subtractor should be enough to completely eliminate the occurrence of the reception signal RFE in the output signal TNExe2x80x2 from the echo canceller EC.
However, in practice the main or basic echo cancellation by using the transfer function estimator and the subtractor cannot remove the returning signal completely. The reason for this is that the transfer function estimator H, EST cannot perfectly estimate the transfer function, in particular the transfer function of the acoustic coupling between the loudspeaker SP and the microphone MC. Consequently, some parts of the received far end signal RFE will still be present in the signal TNExe2x80x2 transmitted to the far end transceiver unit. In the far end transceiver unit such remaining parts will still be perceived as an echo. Since a main echo cancellation has already removed some of the main echoes, the remaining parts of the far end signal are called xe2x80x9cresidual echoesxe2x80x9d. Therefore, additional signal processing has to be applied to the residual signal TNExe2x80x2 and in the context of conventional echo cancellation this additional processing is called xe2x80x9cresidual echo cancellationxe2x80x9d. Thus, in some conventional echo cancellation devices an additional residual echo suppression device is used for suppressing residual echoes in the subtractor output signal TNExe2x80x2. This will be considered below with reference to some examples of the published prior art.
In modern mobile communication systems, i.e. GSM, the voice signal TNExe2x80x2 of FIG. 1, is not transmitted as a representation of the voice signal amplitudes. Instead the voice signal is coded and in GSM the speech coding is based on a model for speech generation. Commonly used methods to model speech are described in L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice Hall, Englewood Cliffs, N.J., 1978. In particular, a model which models the excitation signal and the vocal tract of the speaker is often used in signal processing. This model is defined by two types of excitation signals and a filter. The two excitation signals correspond to:
1) a pulse train used for voiced speech, e.g. the sound xe2x80x9caxe2x80x9d;
2) a white noise used for unvoiced speech, e.g. the sound xe2x80x9csxe2x80x9d.
The used filter models the vocal tract and it is convenient to use an AutoRegressive (AR) filter. By using the speech model it is possible to create an artificial voice. Actually, the voice will sound unnatural due to the excitation signals. However, if the excitation is chosen with care, more natural sounding speech can be produced.
Typically, speech modelling is used in speech coders, e.g. in the Full Rate (FR) coder in GSM. The FR coder is known as a Regular Pulse Excitation-Long Term Prediction (RPE-LTP) coder and is described in for example the GSM specification GSM 06.10. A simplified description, see FIG. 3, of the FR coder is as follows:
A frame of input samples TNExe2x80x2, in GSM one frame consists of 160 samples, is presented to the coder input, e.g. in the form of the signal TNExe2x80x2 output by the echo canceller EC. The input is used so as to determine an AR model, in FIG. 3 represented by COD-AR. This is accomplished by exploiting the Toeplitz structure of the TNExe2x80x2 correlation matrix, i.e. using a Schxc3xcr recursion as described in J. G. Proakis and D. G. Manolakis: Digital signal processing: principles, algorithms and applications, Macmillan, publishing company, New York, 2nd edition, 1992. This recursion results in a set of coefficients termed reflection coefficients and may be used in a lattice filter realization. Based on the obtained coefficients the input frame is filtered through the inverse of the AR model (which can be implemented as a lattice structure), which ideally would produce the excitation signal output as the residual signal denoted RES in FIG. 3, (note, here residual signal is not equivalent to residual echo). That is, the spectral characteristics of the input signal have been flattened.
It is clear that the AR filter computed along with the residual signal can be used to restore the original input signal by filtering. However, the transmission of parameters and the residual signal would not correspond to a good compression ratio. To increase the compression ratio the GSM FR coder exploits the residual signal to compute a long term prediction in a device LTP of FIG. 3, which essentially corresponds to a measure of the periodic nature of the residual, e.g. a frequency related to the vibration of the vocal cords. Based on the long term prediction LTP the residual signal is down sampled (re-sampled) by a factor of three in a device DD (decimation device) in FIG. 3.
The re-sampled residual signal EXS, the AR filter coefficients LARP and the gain coefficients are quantized and organized in a block referred to as a speech frame (260 bits). This is performed in FIG. 3 by a frame packing device FPD. A few other coefficients are also included in the speech frame but these will be left out for reasons of simplicity as described in GSM 06.10.
At the receiver end, see FIG. 4, the speech frame is unpacked in the frame unpacking device FUD and the residual signal is up-sampled in the excitation reconstruction device ERD and is used as excitation signal EX to the vocal tract filter VTF (which is an AR filter). The above description is only a simplification of the GSM-FR speech coder. No effort has been invested in what formats the filter coefficients have. However, broadly speaking, the filter parameters are transmitted as Log Area Ratio (LAR) parameters, denoted LARP in FIG. 3, rather than a reflection coefficient or the coefficients occurring in the denominator polynomial of the AR filter.
As shown in FIG. 3, the speech coder COD comprises the coding block SPECOD and a voice activity detector COD-VAD. As explained above, the vocal tract is modeled by an Auto Regressive (AR) Model in an COD-AR Unit. Thus, the parameters LARP of the AR model (i.e. the vocal tract filter) and information EXS regarding the excitation signal are transmitted to the far end transceiver unit.
As shown in FIG. 1, after receiving frames of the reception signal REFxe2x80x3 including the AR parameters LARP and the excitation signal information EXS by an antenna ANT and an antenna switch SW and a receiving circuit RCRT, the received parameters and the received information is used to perform a speech synthesis in a speech decoder DECOD, depicted in FIG. 4. As explained, the transmission of the parameters and the information of the speech model is carried out on a frame-basis, requiringxe2x80x94dependent on the used speech model and the transmission speedxe2x80x94a certain bandwidth (number of bits per second) which must be provided by the transmitting unit TR. This required bandwidth can be quite large and can thus cause the resources of the transmitting unit TR to be occupied to a large extent during the transmission of speech.
However, in a typical telephone call there are also speech pauses where the near end speaker does not speak into the microphone MC, i.e. no speech is present in the near end transmitting signal TNE. In this case, the speech coder COD only has to code background noise. The coding of quite irrelevant background noise with the same bandwidth as used for the coding of the speech would be quite a waste of resources in the transmitting unit TR. Therefore, in the speech pauses, modern speech coders COD often enter a mode called Discontinuous Transmission Mode (DTX) controlled by the Voice Activity Detector (VAD) COD-VAD, linked to the speech coder COD. In the DTX mode of operation, the speech coder codes the background noise using the AR model device COD-AR within the coder block. However, in DTX mode the coded parameters are packed in the frame packing device FPD in a special frame which is called the Silence Descriptor (SID) frame. The unit TCRT responsible for the GSM protocol can determine when and where (in the TDMA structure) the SID frame is sent via the antenna ANT. By using the DTX mode a lower bit rate can be used.
More specifically, the VAD used in GSM is defined in GSM 06.32 which based on the input frame in the signal TNExe2x80x2 determines if a frame contains speech or not. The VAD used in GSM monitors the transmission speech coder parameters SPPAR related to TNE (more precisely the transmission signal TNExe2x80x2 output by the echo canceller EC) to detect speech pauses. The VAD sets a so-called VAD flag VFLG in FIG. 3 to one or zero so as to indicate speech and no speech, respectively. This voice activity detection is based on an adaptable energy threshold, i.e. the voice activity detection depends on the energy of the observed signal TNExe2x80x2. For example, when the signal input to the voice activity detector VAD falls below a predetermined threshold, the input signal is marked as no speech. To avoid a truncating of low power speech, an extra delay may be used (which is called the hangover-time), before the VAD flag is set. The use of the SID frame is incorporated and defined in the standard protocols of GSM.
In addition to the voice activity detection, the voice activity detector COD-VAD estimates the periodicity of the input signal (TNE or TNExe2x80x2), which will be an additional decision factor for the setting of the VAD flag VFLG. Provided the input frame of the signal TNExe2x80x2 does not contain speech as indicated by the respective setting of the flag VFLG the speech coder will form the special Silence Descriptor SID frame in the frame packing device FPD. The SID frame consists only of the filter coefficients LARP as determined by the device COD-AR.
Upon receiving and detecting a SID frame on the receiving side in the decoder DECOD in FIG. 4, a Pseudo Noise generator device PNG is used as input to the vocal track filter VTF (position B in FIG. 4). The output at the receiver side is termed comfort noise and is supposed to mimic the background noise at the transmitter side.
Consequently, in case of a set VAD flag VFLG, a SID frame is made in which the AR parameters from the device COD-AR, i.e. the vocal tract parameters, are the only valid data. Evidently, the speech coder always operates on each input frame from the signal TNExe2x80x2 and always produces an output frame TFExe2x80x2 (speech or SID frame). However, in case the output of the speech coder is a SID frame the GSM protocol allows a reduced transmission rate of consecutive SID frames in the signal TFE. That is, the transmission unit TCRT of the transceiver unit TRU does not have to transmit the parameters and the information at the same bit rate as used during speech coding. Consequently the transmitting unit TCRT can save power and increase the battery life of the transceiver unit TRU.
As explained, the SID frame is transmitted to the far end transceiver unit TRU and the speech decoder DECOD unpacks in the frame unpacking device FUD of FIG. 4 the SID frame as so-called comfort noise. Therefore, on the receiver side TRU only the AR model VTF is driven by a white noise generated by a Pseudo Noise (PN) Generator PNG located in the receiving unit RX of FIG. 1, for example in the speech decoder DECOD, cf. FIG. 4. Alternatively, if the communication terminates in a telephone of a Public Switched Telephone Network (PSTN), then the speech coder COD, decoder DECOD and pseudo noise generator PNG can be located in the network.
As shown in FIG. 2B and FIG. 2C, rather than just producing SID frames in DTX mode as explained above, it is also possible to manipulate the speech coder COD such that it will transmit codes for the background noise only when no speech is present. Basically, this can be done in two ways:
I) Taking an output frame from the speech coder COD and convert it to a SID frame (FIG. 2C; and
II) Alternatively, synthetic background noise is generated at the input of the speech coder COD, such that the speech coder will code this artificial noise. If a DTX functionality exists the coder COD will most likely enter the DTX mode and will start to produce SID frames (FIG. 2B).
Regarding a residual echo cancellation, the two alternatives I, II may be used to suppress residual echoes and hereinafter alternative one and two are termed residual echo suppression method of type I and type II, respectively.
Type I: Conversion to a SID frame FIG. 2C
Even when no speech is actually generated at the near end side, there is still the possibility that an echo and in particular a residual echo is present in the input signal to the speech coder COD. The fact that the residual echo is still present in the input signal to the speech coder can be exploited for the generation of background noise transmission codes. That is, the usage of an echo suppression method of type I will set the transmitting unit TR in a DTX mode of operation at times without near end speech and the residual echo as well as the background noise signal are used in the speech coder to form a speech frame.
In DTX mode the VAD indicates via the VAD flag VFLAG that only a Far End reception Signal is present in transmission signal TNE and consequently the speech frame is converted to a SID frame in a Make-SID frame device MSID of the MSIDM device shown in FIG. 1 (in dashed lines) and in FIG. 2C. The generation of transmit codes for the background noise on the basis of the residual echo (i.e. the remains of the received and acoustically coupled far end signal) is indeed possible since the spectral influence of the residual echo can be regarded negligible.
When on the far end receiving side the far end transceiver unit receives codes for the background noise which are formed on the basis of the residual echoes in the near end transmitting unit TR, then the excitation signal EX used to form the near end signal at the terminal of the far end transceiver unit in the DTX mode operation will still be a white noise generated by the pseudo random noise generator PNG (see FIG. 4). Therefore, the far end transceiver unit will actually not generate a residual echo but noise and thus the far end user will perceive the received signal in DTX mode operation as noise rather than a residual echo.
As shown in FIG. 4, the speech synthesis carried out in the speech decoder DECOD is based on two types of excitation signals, however, in the DTX mode operation only one excitation signal is used, i.e. the switch in FIG. 4 is controlled in position B by the switch signal FT output by the frame unpacking device FUD. This excitation signal is not in any way associated with the speech coding or background noise coding process carried out in the speech coder COD on the near end transceiver unit TRU.
Type II: Generation of Synthetic Background Noise (FIG. 2B)
Alternatively as in FIG. 2B, instead of using the residual echo in the speech coder COD for forming an estimate of the background process, it is also possible to generate a noise sequence which resembles the background noise when no near end speech activity is present.
As shown in FIG. 1 (in dashed lines) and in FIG. 2B, the transmitting unit TR comprises an additional noise generation means NGM including a noise generator NG generating a white noise and driving an AR model unit AR, a background estimation device BEST receiving the A/D converted version of the transmission signal TNE (including said echo signal) and controlling the parameters of said AR model in the AR model unit AR via a setting signal AR-PAR, a voice activity detector VAD receiving the subtractor output signal TNExe2x80x2 (including the residual echo) and outputting a control output no-talk NT to a switch SW2, and another switch SW1 controlled by the additional VAD output signal far-end-single-talk FEST for switching to said speech coder COD in a first switching state B an output from the echo canceller EC and in a second switching state A an output from said AR model unit AR. The device BEST is only operable in case of no near end and no far end speech in the signal TNE. Therefore, in case of NT true (no talk) the signal TNE is connected to the device BEST trough the closed position switch SW2 and in case of NT false (talk) the switch SW2 is open and the device BEST does not operate. The voice activity detector VAD can be incorporated in the coder COD, as shown in FIG. 3, or it can be provided outside of the coder COD.
Considering the devices in FIGS. 2A, 2-B and 2C in combination (e.g. in FIG. 1 the dashed boxes NGM and/or MSIDM are present) four different cases can be distinguished depending on whether or not there is a speech activity in the microphone MC and whether or not there is a coupling of the signal received from the far end into the signal TNE causing a residual echo in the output of the echo cancellor EC. The four cases are as follows:
1. There is near end speech as well as background noise present in the pulse code modulation (PCM) samples in the respective speech-frame. This corresponds to a situation of a normal speech with no additional echoes.
2. There is only background noise and no speech present in the PCM samples, i.e. the coder COD will enter the DTX mode of operation.
3. There is a near end speech pause and an echo and consequently a residual echo as well as background noise is present in the PCM samples.
4. There is near end speech, a residual echo of a signal received from the far end,
and background noise present in the PCM samples.
In case 1 the switch SW1 shown in FIG. 2B and FIG. 2C is set in position B because the VAD signal FEST in false. In this case, a normal operation of the transmitting unit TR is commanded and the near end speech and the near end background noise is fed through the echo canceller EC and straight through to the speech coder COD. Since the VAD output signal NT is false (talk) the additional switch SW2 in FIG. 2B is an open position.
In case 2, the switches SW1 in FIG. 2B and FIG. 2C can assume position A or B and the VAD signal FEST is false. Preferably, the switches are in position B. The VAD output signal NT is true and thus in FIG. 2B the additional switch SW2 is in a closed position. In this condition, the device BEST operates and estimates the spectral characteristics of the TNE background signal.
In case 3, background noise from the microphone MC as well as a residual echo is present in the subtractor output signal TNExe2x80x2. In case 3, the switch SW1 of FIGS. 2B and 2C is set to be in position A because the signal FEST is true. That is, in FIG. 2B the residual echo is not fed to the coder COD. However, the signal to the coder COD in FIGS. 2B and 2-C will be provided with a signal which mimics the background noise via the device NGM and/or the device MSIDM. It should however be noted, that only in case 2 the AR model of FIG. 2B is updated, possibly by using the output TNExe2x80x2 from the echo canceller EC. In FIG. 2C the coder COD does receive the residual echo along with the background noise signal. However since the switch SW1 is in position A, the speech frame will be manipulated by MSID so as to form a SID frame. To this end, it is understood that a DTX functionality is supported by the protocol. However, it should be pointed out that the unit MSID can manipulate a speech frame in a way such that the information related to the excitation signal EXS in FIG. 3 may be replaced with noise excitation. In this way a system without DTX functionality may use FIG. 2C. The VAD output signal NT is false and thus the additional switch SW2 in FIG. 2B is in an open position.
In case 4 the switch SW1 of FIGS. 2A and 2B is controlled to be in position B since the VAD signal FEST is false. The near end speech will mask the residual echo remaining in the output signal TNExe2x80x2 of the echo canceller EC. That is, when speech is present as well as the residual echo, the residual echo will be masked and there is no need for removal thereof. The VAD output signal NT is false and thus the additional switch SW2 in FIG. 2B is in an open position.
To summarize, if in any of the above cases 1.-4. the switch SW1 is in the position A, the coder COD will generate coding information (code words) which depending on the situation are based solely on the background noise or based on the background noise also including the echo or the residual echo.
Therefore, in the case of FIG. 2B (Type II), the speech coder COD receives a synthetic background noise signal generated by a synthetic noise generator NGM in the transmitting unit TR. When the speech coder COD detects such a synthetic background noise, the speech coder COD will automatically enter the DTX mode.
Some speech coder systems do not have a DTX functionality and therefore all frames will be speech coded. However, since no near end speech is detected the speech coder will code the background noise in terms of a speech frame and on the far end side the signal received contains no residual echo. Thus, in order to prevent a residual echo one possibility is to use a synthetic background signal at the input of the speech coder, provided no near end speech is present.
The following published prior art documents can be referred to in terms of what has been described above.
In the U.S. Pat. No. 5,563,944 an echo cancellation device is described where an additional residual echo suppression device is provided downstream from a main echo cancellation device. This document therefore describes the preamble features of the attached claims 1, 14, 19. The residual echo suppression device estimates a residual echo level in a residual signal and produces a threshold signal with a threshold level equal to the residual echo level. A residual echo suppressor is provided for adaptively controlling a suppression amount for the residual echo based on the threshold signal supplied from the residual echo level estimator. Thus, a residual echo suppression is carried out downstream from the main echo cancellation based on a threshold level determination of the echo signal.
The European patent application EP 0 884 886 A2 describes an echo cancellor employing a multiple step gain. Here, a noise cancellation means acts as a kind of residual error suppression device as in the preamble of claims 1, 14, 19. The noise cancellation means estimates signal components due to the local background noise and removes these noise components from the outgoing signal. This noise cancellation means employs any of various well-known noise cancellation methods, such as a spectral subtraction, band splitting attenuation or adaptive filtering.
In the patent abstracts of Japan JP 63-42527 a cascaded echo cancellation arrangement is disclosed. Between two echo cancellation stages an equalizer is provided which performs an equalization of the wave-form distortion due to line characteristics. A subtractor subtracts the approximate echo component from the equalized reception signal which is output by the equalizer in order to cancel an echo component. Thus, the wave form distortion is equalized and the echo component is cancelled when the transmission signal of the other party is output to a reception terminal.
The U.S. Pat. No. 5,721,730 describes a residual echo cancellation by attenuating the subband error signals on an independent basis in response to a comparison of the relative levels of the corresponding subband send-input signals, suband receive-input signals and subband error signals. Thus, in this echo canceller an injected noise component is more accurately related to the prevailing noise spectrum within the transmitted signal.
The U.S. Pat. No. 5,283,784 relates to a residual echo cancellation by comparing relative levels of the sent input signal, the received input signal and an error signal remaining after removal of an expected echo signal from the sent input signal. Thus, a residual echo from an echo canceller circuit is reduced by a variable attenuator. It is also described that a non-linear processor or center clipper removes any residual echo that remains in the output signal after subtraction of the anticipated echo and is arranged to remove residual echoes in the output resulting from the far end speaker""s signal and to pass the signal of the near end speaker without distortion. This non-linear processor avoids a sudden and noticeable variation in the output of the echo cancellor by removing residual echoes proportionally rather than by operation above a threshold signal level. The non-linear processor detects the average background noise level and proportionally injects a noise signal in the output to maintain the average level not withstanding the variation in operation of the non-linear processor which occurs with the presence or absence of a signal from the near end speaker and the far end speaker, respectively.
The U.S. Pat. Nos. 5,222,251 and 5,646,991 disclose echo cancellation devices which also exploit the speech coder characteristics for a residual echo cancellation. In this respect these documents have some relationship with the above described FIG. 2B.
In the context of FIG. 2B, U.S. Pat. No. 5,222,251 discloses that the acoustic echo should be replaced with at least one codeword generated by a communication device wherein said codeword represents an energy and a spectral content of the ambient noise, i.e. the background noise. However, this patent does not disclose which code-word is meant, i.e. whether it is the code-word of the PCM coding apparatus or a code-word of the GSM speech coder, i.e. the coder COD shown in FIG. 3. U.S. Pat. No. 5,222,251 also discloses a method for residual echo cancellation where it is decided if speech is transmitted in the transmitting unit TR and a threshold is computed. If the acoustic echo is smaller than the generated threshold, the code-word is replaced. The threshold may also be compensated for losses due to AEC.
Furthermore, in the context of FIG. 2B, U.S. Pat. No. 5,646,991 discloses different noise generation means in order to impress a synthesized noise replacement signal upon the output signal of an echo cancellor when background noise is present in the transmitting signal. In this patent a spectral response means is provided responsive to a far end speech absence signal and a near end speech absence signal and receives a noise signal from an output speech channel for determining in accordance with a predefined spectral response formant a spectral response characteristic. A noise generator means is responsive to said near end speech absence signal and to said far end speech present signal for generating a synthesized noise replacement signal in accordance with the spectral response characteristic. The noise generator means switchably impresses this synthesized noise replacement signal upon the output speech channel. According to another alternative in this patent, a spectral response means is responsive to said far end speech absence signal and to said near end speech absence signal for receiving the noise signal and determines in accordance with a predetermined spectral response formant a spectral response characteristic. A noise generator means is responsive to said near end speech absence signal and to said far end speech present signal for generating a synthesized noise replacement signal in accordance with the spectral response characteristic and the noise magnitude.
As explained above, in conventional residual echo cancellation devices additional noise generation procedures are used to produce modified code-words at the input is of the speech coder COD in order to get rid of the residual echo when a background noise is present or is not present and when speech is present or is not present. On the other hand, the typical use of residual echo cancellation devices relying on center clippers which are non-linear elements results in the disadvantage that undesired distortions are introduced in the signal transmitted to the far end. Most importantly, as shown in FIGS. 2B and 2-C, in conventional echo cancellers the signal to be transmitted bypasses the echo canceller and synthetic noise in generated to be transmitted to the coder COD. However, this noise generation does not relate directly to the actual microphone signal content and it is not related at all to the received signal or a signal output like TNExe2x80x2 of the echo canceller. When the VAD malfunctions, i.e. either it does not detect the renewed generation of speech in the signal TNE or it does not detect the absence of speech quickly enough, then the user at the far end will either hear noise and not the actual speech or the user will first hear a speech frame coded actual background noise (including possible residual echoes) and subsequently the artificial noise thus exposing the user to two different kinds of noise phenomena.
Therefore, the object of the present invention is to provide an efficient echo cancellation device which performs an efficient cancellation of residual echoes without bypassing the echo canceller during presence and/or absence of speech.
According to a first aspect of the invention, this object is solved by an echo cancellation device (claim 1) for cancelling echoes caused by a coupling of a reception signal received by a receiving unit of a transceiver unit of a telecommunication system to a transmitting unit thereof, comprising a transfer function estimator adapted to estimate the coupling transfer function from the receiving unit to the transmitting unit and for processing the reception signal with said estimated coupling transfer function, a substractor adapted to subtract from the transmission signal which includes an echo signal due to the coupling of the received signal to the transmitting unit the processed reception signal, and a residual echo suppression device for suppressing residual echoes in the subtractor output signal, wherein said residual echo suppression device comprises a residual echo filter having an adjustable filter function adapted to remove from the subtractor output signal of the substractor the spectral characteristics relating to the reception signal.
According to a second aspect of the invention, this object is solved by an echo cancellation device (claim 14) for cancelling echoes caused by a coupling of a reception signal received by a receiving unit of a transceiver unit of a telecommunication system to a transmitting unit thereof, comprising transfer function estimator adapted to estimate the coupling transfer function from the receiving unit to the transmitting unit and for processing the reception signal with said estimated coupling transfer function, a subtractor adapted to subtract from the transmission signal which includes an echo signal due to the coupling of the received signal to the transmitting unit the processed reception signal, and a residual echo suppression device for suppressing residual echoes in the subtractor output signal, wherein said residual echo suppression device comprises a residual echo filter having an adjustable filter function adapted to amplify in the subtractor output signal of the subtractor the spectral content of the background signal in the transmission signal transmitted by said transmitting unit.
According to a third aspect of the invention, this object is solved by an echo cancellation device (claim 19) for canceling echoes caused by a coupling of a reception signal received by a receiving unit of a transceiver unit of a telecommunication system (TELE) to a transmitting unit thereof, comprising a transfer function estimator adapted to estimate the coupling transfer function from the receiving unit to the transmitting unit and for processing the reception signal with said estimated coupling transfer function, a subtractor adapted to subtract from the transmission signal which includes an echo signal due to the coupling of the received signal to the transmitting unit the processed reception signal, and a residual echo suppression device for suppressing residual echoes in the subtractor output signal, wherein said residual echo suppression device comprises a residual echo filter having an adjustable filter function and a noise generation means adapted to add noise in the filter output signal in a spectral region relating to the reception signal for masking residual echoes.
Further Advantageous Embodiments
The above described aspects of the present invention can also be used in combination. That is, the first and second aspect, the first and third aspect, the second and third aspect and the first, second and third aspect may be combined. Further advantageous embodiments and improvements of the invention can be taken from the attached dependent claims. It should also be noted that the invention can comprise embodiments resulting from a combination of features separately claimed in the claims and/or described in the specification including the features described as background of the invention or prior art in the aforementioned introduction even if such prior art only refers to an internal state of the art of the applicant.
Hereinafter, the embodiments of the invention will be illustrated with reference to the attached drawings.