1. Field of the Invention
The present invention relates to an echo canceller, and more particularly to an echo canceller which may, for example, be incorporated in a personal computer.
2. Description of the Background Art
In recent years, IP (Internet Protocol) phones have become popular, which are based on the VoIP (Voice over Internet Protocol) to switch and convey voice signals on an IP network such as the Internet. For example, a kind of IP phone referred to as a softphone which is installed and operates on a personal computer (PC) has come into widespread use. The softphone uses a sound device that is on board the personal computer and may include an analog-to-digital (A/D) converter or a digital-to-analog (D/A) converter to input or output voice signals.
The processing by a conventional echo canceller using such a sound device will now be described below with reference to FIG. 2. Voice data coming from a distal-end talker, not shown, is sent over the Internet 100 to a network interface 114 provided in a network terminal device 115, such as a personal computer. The voice data is then decoded by a decoder 101 into a voice waveform signal, referred to below as a voice signal. The voice signal is inputted to a voice output driver 102, which then delivers an output signal to a storage buffer 104 provided for buffering, or absorbing the impact of, a voice break on a sound board 103 included in the network terminal device 115.
The break absorbing buffer 104 outputs the voice signals in the order of storage to a D/A converter 105. The D/A converter 105 converts the signals into corresponding analog signals, and then outputs the converted analog signals via a loudspeaker terminal 116 to a loudspeaker 106 and also to an A/D converter 108. The voice radiated from the loudspeaker 106 may be caught by a microphone 107 as depicted with a solid line 120 and thence supplied to the A/D converter 108 in the form of electric signals. The signals supplied to the A/D converter 108 include a signal component directly sent from the D/A converter 105. The signal component is inputted as a signal r(k) via another break absorbing buffer 109 to an adaptive filter 111 in a voice input driver 110.
On the other hand, the voice signal component coming from the loudspeaker 106 and supplied to the A/D converter 108 is inputted via the break absorbing buffer 109 to an adder 112. The adaptive filter 111 has the signal r(k) and a signal f(k) inputted to produce a cancellation signal or pseudo echo signal s(k). For producing the pseudo signal s(k), an adaptive algorithm that minimizes the signal f(k) may be used, such as a well-known NLMS (Normalized Least Mean Squares) algorithm. However, any suitable one of a variety of other adaptive algorithms that will minimize the signal f(k) may also be used.
The adder 112 sets off, or cancels; the signal component coming from the loudspeaker 106, i.e. an echo signal, with the pseudo echo signal s(k) to output a resulting signal f(k) to the adaptive filter 111 and an encoder 113. In order to transmit the voice signal on the Internet 100, the encoder 113 assembles the voice data into packets and outputs the resulting packets via the network interface 114 to the Internet 100.
Meanwhile, if the network terminal device 115 is a processor device, such as a personal computer, that executes a variety of processing other than communication processing, the echo canceller may suffer from undesirable situations.
Specifically, a general-purpose device, such as a personal computer, may execute various application programs, other than the processing requiring the real-time operation, such as communication, to use a variety of computer resources, for example, a CPU (Central Processing Unit) or memories. Therefore, the voice output driver 102 or the voice input driver 110 may sometimes temporarily refrain from processing.
Usually, such a hold back of the voice processing directly leads into interruptions of voice signals. To cope with this situation, in the conventional practice, voice data are stored in the break absorbing buffers 104 and 109, and during the voice processing being held back the voice data thus stored in the storage buffers 104 and 109 are outputted to the D/A converter 105 or the A/D converter 108, thereby preventing the voice breaks from occurring. Meanwhile, if the data stored in the storage buffers 104 and 109 has become depleted during the interruption, the voice data can no longer be outputted.
Conversely, even when the voice output driver 102 or the voice input driver 110 is in operation, the outputting of the break absorbing buffers 104 and 109 may be held back due to some causes. In this case, the storage buffers 104 and 109 still continue to receive data, but can nevertheless output no data, thus causing the data to continuously be stored so that the buffers may be saturated. During the saturation, if the storage buffers 104 and 109 are supplied with further data, such they cannot store the further data but render them discarded. Such data depletion or saturation in the break absorbing buffers 104 and 109 may lead to a phenomenon of the voice breaks in terms of voice signals.
In addition, such breaks in voice signals or fluctuations in storage volume of the break absorbing buffers 104 and 109 may lead to deterioration in performance of the echo reduction carried out with the use of the adaptive filter 111 for the following reasons.
Usually, the adaptive filter 111 for echo reduction, i.e. echo canceller, is installed so as to have its inputs interconnected to receive the outputs from the decoder 101 and the break absorbing buffer 109. Consequently, the signal path beginning with the voice output driver 102, passing through the break absorbing buffer 104, loudspeaker 106 and microphone 107, and extending through the break absorbing buffer 109 to the voice input driver 110 may involve fluctuation in storage volume or delay in, and voice breaks caused by the depletion or saturation of, the two break absorbing buffers 104 and 109, resulting in temporal fluctuation of the echo path.
As notorious, the adaptive filter 111 may be in full play on the presupposition that the echo path is temporally invariant. Therefore, in case the echo path undergoes temporally fluctuates, the performance of the adaptive filter is significantly deteriorated.
In the conventional echo canceller shown in FIG. 2, a signal outputted from the D/A converter 105 is directly supplied to the A/D converter 108, which in turn outputs a signal via the break absorbing buffer 109 to the adaptive filter 111. That is, the signal subjected to voice breaks, i.e. the buffer depletion or saturation, at the break absorbing buffer 109 is supplied to the adaptive filter as a reference input signal r(k). On the other hand, an echo signal, which has received the consequence of voice breaks, i.e. the buffer depletion or saturation, at the break absorbing buffer 109, is supplied to the adder 112.
In this way, the consequence of the break absorbing buffer 109 is taken into the input of the adaptive filter 111 as well to thereby allow the consequences of the break absorbing buffers 109 on the couple of paths for transmission and reception to be canceled out apparently. Thus, the adaptive filter 111 is not subject to temporal fluctuations on the echo paths, thereby preventing the capability of echo reduction from being deteriorated.
Incidentally, Japanese patent laid-open publication No. 2004-40589 discloses a hands-free talk device, wherein a reference signal outputted from a voice output driver is connected to an echo canceller via a buffer for received signal and a buffer for transmitting signal, and then a received signal passed through the buffers is acquired as the reference signal.
U.S. patent application publication No. US 2009/0129584 A1 to Aoyagi, et al., discloses an echo canceller, wherein delay time information on the delay characteristics of an echo path is obtained on the basis of a correlation between a smoothed received speech signal and a smoothed sending speech signal, update information indicating the execution or suspension of updating the tap coefficient of an adaptive filter is obtained on the basis of a received speech signal, a sending speech signal and the delay time information, and then, when the update information indicates the execution of the updating, an imitated echo generator performs the update of the tap coefficient and utilizes the delay time information as a reflection of the delay characteristic on the echo path.
Another Japanese patent laid-open publication No. 2000-295461 discloses a manner for calculating the highest power component from an input signal.
A conventional solution on an echo canceller may be referred to on the website, http://www.onosokki.co.jp/HP-WK/c_support/newreport/soundquality/soundquality—2.htm, Ono Sokki Technical Report “On Sound Quality Evaluation, Chapter 5. Concept Forming the Basis for Loudness Calculations”, which the applicant noticed on Jan. 31, 2009.
However, in the conventional solution as shown in FIG. 2, since it is necessary to provide an interconnection from the loudspeaker 106 to the microphone 107 for echo reduction, it is necessary to use the dedicated sound board 103. Accordingly, the device on the whole tends to be costly.
If desired to dispense with such a dedicated board, the output from the loudspeaker 106 on one of the channels operated stereophonically needs to be connected by electrical wiring to the microphone terminal on the other channel. This may lead to a deficiency that the inherent stereo sound cannot be used. In addition, a further modification is necessary in which a reference signal for the echo canceller is sampled from a signal on the microphone side. It is however extremely difficult for the user to make such a modification from the perspective of both the hardware and the software.
These inconveniences are attributable to an abrupt change on the echo path. In this consideration, the inventor of the present application has proposed in the Aoyagi, et al., another solution to cope with these inconveniences, together with a co-inventor. With the proposed solution, the gross characteristics of a reference input signal and an echo signal, that is, the envelopes of the power of the reference input signal and the echo signal, are used to drive an adaptive filter, and an initial delay is adaptively estimated and the taps of the echo canceller are decided in order to track temporal changes from the initial delay.
However, in using the envelope characteristics, before data of a section have been secured such that the envelope of a voice signal represents the characteristics on how the voice waveform is, there is a possibility that the correlation cannot accurately be calculated. Because, in order to calculate a voice envelope that would inherently moderately fluctuate, time delay is unavoidably caused. Thus, if an abrupt change in delay occurs often on the echo path without any sign of prognostics, as in the case of a buffer in a personal computer described above, the envelope correlation cannot track such a change, with the result that the echo cannot be removed.