1. Field of the Invention
The present invention relates to an echo processor for reducing echoes generated in communication wires or generated by an echogenic environment between a speaker and microphone in a voice telecommunication system, a television conference system, and so on.
2. Description of Background Art
Echo processors (echo cancelers) are widely used for canceling acoustic echoes or electrical echoes generated in television conference systems, handsfree car telephones, or telecommunication lines. A typical echo canceler includes an adaptive filter for canceling echoes and an echo suppressor for suppressing the amplitudes of residual echoes that the adaptive filter cannot cancel out. However, typical echo suppressors suppress acoustic background noises in addition to residual echoes, thereby accentuating a sense of interruptions of background noise and deteriorating the communication quality. In order to mitigate a sense of discontinuity, an echo canceler generates pseudo background noise components and mixes them with output signals.
An example of such echo processors is disclosed in JP-A-2000-224081 entitled “Echo Canceler Device.”
FIG. 12 is a block diagram illustrating a structure of a conventional echo processor disclosed in JP-A-2000-224081. As shown in FIG. 12, the echo processor includes an adaptive filter 100, a pseudo background noise generator 120, an AFB (Analysis Filter Bank) 131, a first suppressor 132, an adder 133, an SFB (Synthesis Filter Bank) 134, a second suppressor 135, a first level estimator 136, a second level estimator 137, and a detector 138.
Operations of the echo processor will be described next. The adaptive filter 100 partially cancels out an echo in an input signal S[t] and outputs an echo-canceled input signal U[t]. The echo-canceled input signal U[t] still includes residual echoes since the adaptive filter 100 cannot remove all echo components.
The AFB 131 divides the input signal U[t] according to frequency bands, whereby generates frequency-divided input signal U[t,j]. The AFB 131 supplies the input signals U[t,j] to the first suppressor 132 and the pseudo background noise generator 120. The suffix j means the number given to each frequency band. The first suppressor 132 gives a loss Loss1 to the residual echo components at each band to attenuate, suppress, or remove the echo. The loss Loss1 is calculated as follows:
First, the first suppressor 132 compares an average power Pow(Rin) of the far-end speech signal Rin with an average power Pow(S[j]) of the input signals U[t,j]. If the former is greater than the latter, the first suppressor 132 subtracts a constant μ from a previous loss component Loss1[j], whereby a new loss component Loss1[j] is obtained in accordance with formula (1).Loss1[j]=Loss1[j]−μ  (1)
where μ is a constant, i.e., a step value of suppression amount (loss) Loss1.
On the contrary, if the average power Pow(Rin) is equal to or less than the average power Pow(S[j]), the first suppressor 132 adds the constant μ to the previous loss component Loss1[j], whereby a new loss component Loss1[j] is obtained in accordance with formula (2).Loss1[j]=Loss1[j]+μ  (2)
In either event, the first suppressor 132 adjusts the loss component Loss1[j] to fall into a range represented in formula (3).Loss(max)≦Loss1[j]≦0(dB)  (3)
where Loss(max) is the maximum loss that the first suppressor 132 can give to residual echo components.
Repetitions of the comparison and adjustment may control to converge the loss Loss1 depending upon the level of the residual echo. During the process of applying the loss Loss1 to the residual echo, the first suppressor 132 not only suppresses or removes most of the residual echo components, but also suppresses acoustic background noise components mixed with the echo components, accentuating a sense of speech interruptions for the far-end talker if no additional proceeding is applied.
The pseudo background noise generator 120 estimates the levels of the background noises of the frequency-divided input signals U[t,j] and generates pseudo background noises N[t,j] of which the levels are the same as that of the background noise. The pseudo background noises N[t,j] are supplied to the adder 133, which adds the pseudo background noises N[t,j] to the input signals U[t,j] in which the echo components have been reduced by the first suppressor 132. The background noise levels after the addition may be adjusted to be equal to the pseudo background noise level.
The output signals O[t,j] from the adder 133 divided in accordance with frequencies are supplied to the SFB 134 that synthesizes them into an output signal O[t]. The SFB 134 supplies the output signal O[t] to the second suppressor 135.
The second level estimator 137 measures the instantaneous levels of the frequency-divided output signals O[t,j]. The first level estimator 136 measures the instantaneous level of the pseudo background noises N[t,j] at respective frequency bands. Comparing the measurements by the level estimators 136 and 137 leads a decision as to whether there is a near-end speech actually as will be described next.
The measurements by both level estimators 136 and 137 are supplied to the detector 138 that detects sounding or silence (decides whether or not there is a near-end speech actually) on the basis of the measurements. The detector 138 synthesizes the sounding/silent detection results at respective frequency bands. If it is decided that there is a near-end speech at one or more frequency bands, the detector 138 outputs a digital signal “1” that means sounding. If it is decided that there is no near-end speech at all frequency bands, the detector 138 outputs a digital signal “0” that means silence.
The digital signal output from the detector 138 is supplied to the second suppressor 135 that decides a suppression amount Loss2 on the basis of the output signal of the detector 138 in accordance with the manner that will be described next, and gives the loss Loss2 to the signal O[t] for attenuating it.
If the decision by the detector 138 is zero (silence), the detector 138 adds the constant μ′ to a previous loss Loss2, whereby a new loss Loss2 is obtained in accordance with formula (4).Loss2=Loss2+μ′  (4)
where μ′ is a step value of suppression amount (loss) Loss2. μ′ is a positive constant of which the absolute value is sufficiently small, e.g., 0.1 through 0.01 dB.
On the contrary, if the decision by the detector 138 is one (sounding), the detector 138 sets the loss Loss2 at zero in accordance with formula (5).Loss2=0(dB)  (5)
As will be understood from formula (4), when there is no actual sound, the second suppressor 135 increases the suppression amount Loss2 stepwise, so as to suppress the background noise only. On the contrary, when there is any speech component, the suppression amount Loss2 is set at 0 (dB) instantly in accordance with formula (5), thereby preventing the actual speech component from being suppressed.
As described above, the conventional echo processor divides the echo-canceled input signal into input signals at respective frequency bands by means of a band division filter, estimates the levels of background noises at respective bands, generates pseudo background noise components having an amplitude spectrum resembling that of the background noise, and mixes the pseudo background noise components with the signal suppressed by an NLP (non-linear process), thereby attempting to mitigate a sense of interruptions of background noise.
In the conventional echo processor, the amplitude spectrum of the pseudo background noise components to be mixed may be similar to that of the background noise within the input signal since the levels of the background noise components within the input signal are estimated at respective bands. However, the phase spectrum of the pseudo background noise components is different from that of the background noise within the input signal. Accordingly, although the pseudo background noise components are included in the final output signal, the final output signal still causes a sense of unnaturalness or strangeness.