FIG. 1 is a block diagram showing the configuration of an echo suppressing apparatus of a first example of related art.
FIG. 1 shows an exemplary configuration of an echo suppressing apparatus for suppressing an echo generated in a hands-free phone.
In FIG. 1, an audio signal from the far-end speaker (hereinafter referred to as far-end signal) inputted to input terminal 10 is converted into far-end audio by loudspeaker 2. On the other hand, microphone 1 picks up, for example, the voice of the near-end speaker (hereinafter referred to as near-end audio) and also receives unnecessary far-end audio produced by loudspeaker 2. The sound inputted from loudspeaker 2 to microphone 1 is called an echo. The sound transfer system that handles sound-related signals, ranging from the far-end signal to the output signal of microphone 1, is called an echo path. The sound transfer system includes loudspeaker 2 and microphone 1.
Only the near-end audio is desired to be outputted as the near-end signal from output terminal 9 of the echo suppressing apparatus, and the unnecessary far-end audio contained in the near-end signal is desired to be removed. In particular, when the near-end signal contains a large far-end audio signal component, delayed far-end audio is audible as an echo to the far-end speaker, so that it becomes difficult to have a conversation. To address such a problem, in a method employed in related art, a linear echo canceller is used to remove the echo from the near-end signal. A linear echo canceller is described, for example, in Non-Patent Document 1 (Eberhard HANSLER, “The hands-free telephone problem: an annotated bibliography update,” annals of telecommunications 1994, pp. 360-367).
Linear echo canceller 3 estimates the transfer function of the echo path (echo path estimation), and uses the signal inputted to loudspeaker 2 (far-end signal) to produce a simulated signal (echo replica signal) of the echo inputted to microphone 1 based on the estimated transfer function.
The echo replica signal produced in linear echo canceller 3 is inputted to subtractor 4, which subtracts the echo replica signal from the output signal of microphone 1 to output the near-end signal.
Speech detector 5 receives the output signal of microphone 1, the output signal of linear echo canceller 3, the output signal of subtractor 4, and the far-end signal, uses these signals to detect whether or not the output signal of microphone 1 contains any near-end audio, and outputs the detection result to linear echo canceller 3.
To control the operation of linear echo canceller 3, speech detector 5 outputs “zero” or a very small value as the speech detection result when speech detector 5 has detected any near-end audio in the output signal of microphone 1, while outputting a large value when speech detector 5 has detected no near-end audio.
FIG. 2 is a block diagram showing an exemplary configuration of the linear echo canceller shown in FIG. 1.
As shown in FIG. 2, linear echo canceller 3 includes adaptive filter 30, which is a linear filter, and multiplier 35. Examples of adaptive filter 30 include filters of various types, such as an FIR type, an IIR type, and a lattice type.
Adaptive filter 30 filters the far-end signal inputted to terminal 31 and outputs the processed result from terminal 32 to subtractor 4. Adaptive filter 30 uses predetermined correlation operation to update a filter coefficient in such a way that the output signal of subtractor 4 inputted to terminal 33 is minimized. To this end, adaptive filter 30 operates in such a way that the component in the output signal of subtractor 4 that correlates with the far-end signal is minimized. That is, the echo (far-end audio) will be removed from the output signal of subtractor 4.
When the output signal of microphone 1 contains near-end audio and the filter coefficient is updated in such a state, the resultant change in the filter coefficient may reduce the echo removal capability of adaptive filter 30.
Multiplier 35 is provided to control the filter coefficient update operation performed by adaptive filter 30. Multiplier 35 multiplies the output signal of subtractor 4 by the output signal of speech detector 5 and outputs the computation result to adaptive filter 30. When the output signal of microphone 1 contains near-end audio, the output signal of speech detector 5 is either zero or a very small value as described above, so that the filter coefficient update operation performed by adaptive filter 30 is suppressed and hence the change in the filter coefficient is small. As a result, the echo removal capability is not greatly degraded.
Thus the echo suppressing apparatus of the first example of related art uses the adaptive filter to remove the echo of the far-end signal.
Next, an echo suppressing apparatus of a second example of related art will be described.
The echo suppressing apparatus of the second example of related art modifies a pseudo echo (echo replica signal), which is used to suppress an echo, according to the angle of a hinge in a folding-type mobile phone. Such a configuration is described, for example, in Japanese Patent Laid-Open No. 8-9005.
The echo suppressing apparatus of the second example of related art includes a control signal generator that detects the angle of the hinge and outputs a control signal according to the angle, and an echo controller that suppresses an echo based on the control signal.
The echo controller includes a coefficient selection circuit that holds a plurality of preset echo path tracking coefficients to produce a pseudo echo corresponding to the echo path that varies according to the angle of the hinge and that uses the control signal outputted from the control signal generator as an address signal to select an echo path tracking coefficient; an adaptive control circuit that outputs a pseudo echo modification signal to modify the pseudo echo based on the echo path tracking coefficient selected in the coefficient selection circuit; a pseudo echo generation circuit that generates the pseudo echo based on the pseudo echo modification signal; and a subtraction circuit that subtracts the produced pseudo echo from the output signal of an audio input unit (microphone).
Next, an echo suppressing apparatus of a third example of related art will be described.
The echo suppressing apparatus of the third example of related art is configured, for example, as described in Japanese Patent Laid-Open No. 9-116469.
The echo suppressing apparatus of the third example of related art suppresses the effects of an echo and surrounding noise that an adaptive filter alone cannot eliminate by determining a gain coefficient based on estimated values of the power of a far-end signal and the power of surrounding noise, subtracting an echo replica signal from the output signal of a microphone, and multiplying the signal obtained by the subtraction by the gain coefficient.
Next, an echo suppressing apparatus of a fourth example of related art will be described.
The echo suppressing apparatus of the fourth example of related art is based on the technology described, for example, in Japanese Patent Laid-Open No. 2004-056453. The echo suppressing apparatus of the fourth example of related art uses either the output signal of a microphone (sound pickup device) or the signal obtained by subtracting the output signal of an echo canceller from the output signal of the sound pickup device as a first signal, and uses the output signal of the echo canceller as a second signal. Then, the echo suppressing apparatus estimates the amount of crosstalk of the second signal (far-end signal, echo) that leaks into the first signal (near-end signal), and corrects the first signal based on the estimation result.
The estimated value of the amount of echo crosstalk is the ratio of the amount according to the amplitude or power of the second signal during the period in which no near-end audio is detected to the amount according to the amplitude or power of the first signal. In the echo suppressing apparatus of the fourth example of related art, for each frequency component in the first and second signals, the first and second signals are used to calculate the amount of estimated echo crosstalk, and the first signal is corrected based on the estimated value that has been calculated.
Although not being a technology for suppressing an echo generated by acoustic coupling between a sound pickup device and a loudspeaker, a technology for removing noise contained in an input signal is described, for example, in Japanese Patent Laid-Open No. 2004-12884 (hereinafter referred to as fifth example of related art).
In the fifth example of related art, the input audio spectrum is used to estimate a noise spectrum for each predetermined frequency range, and the estimated noise spectrum is subtracted from the input audio spectrum. However, a known flooring coefficient β is set in such a way that the amount of subtraction is not too large, that is, the amount of subtraction is limited in such a way that the subtraction result is not smaller than or equal to “β× input audio spectrum.”
The echo suppressing apparatuses of the first and second examples of related art described above can sufficiently suppress an echo when nonlinear elements, such as distortion generated in the echo path, are small. However, in an actual apparatus, a loudspeaker, for example, has a large nonlinear element. The transfer function of an echo path containing distortion is nonlinear, so that linear echo canceller 3 cannot simulate an accurate transfer function of the echo path. In particular, when a small-sized loudspeaker used in a mobile phone or the like produces sound at high-volume levels, a large amount of distortion contained in the sound limits the suppression of the echo to approximately 20 dB. In this case, the echo is transmitted as the near-end signal and is audible to the far-end speaker, so that it becomes difficult to have a conversation.
In contrast, according to the third and fourth examples of related art, particularly the fourth example of related art, the echo is sufficiently suppressed even when the distortion generated in the echo path is large. However, in the echo suppressing apparatus of the fourth example of related art, when the amount of echo crosstalk cannot be estimated in an accurate manner due to the effects of near-end noise and the like, the corrected first signal that has been corrected based on the estimated amount of echo crosstalk, is degraded. That is, the echo is not sufficiently suppressed, or a large amount of distortion is generated in the near-end signal (near-end audio+near-end noise). When distortion is generated, the sound of the near-end signal is distorted as if modulated by the far-end signal. Specifically, the near-end signal becomes a muffled sound only when the amplitude of the far-end signal is large. For example, when the near-end signal is stationary noise, which sounds like “zhaa”, the stationary noise is distorted and sounds like “zow zow” as if modulated by the far-end signal. On the other hand, when the near-end signal is audio, the near-end signal becomes a muffled sound only when the amplitude of the far-end signal is large. In the latter case, since the near-end audio itself changes by large amounts, the sound modulated by the far-end (disturbing sound) is buried in the near-end audio and hence less audible. However, in the former case, the stationary noise is modulated by the far-end signal and converted into a disturbing sound. In particular, in the fourth example of related art, when the echo suppressing apparatus is used in an environment in which near-end audio along with high-level noise is inputted to the apparatus, the error in the speech detection result likely increases, so that the amount of echo crosstalk is estimated in a reduced accuracy, resulting in a more disturbing sound.