The present disclosure relates to an audio processor. More specifically, the disclosure relates to an audio processor of a loud speech communication system.
As communication systems, there are known such as hands-free telephone and video conference systems each including a speaker and microphone. A system of that type, for example, while communication is carried out between a far endpoint device and a near endpoint device that are connected across a network, audio received by a microphone of the far endpoint device is transmitted to the near endpoint device, and the audio is output from a speaker of the near endpoint device (unless otherwise specifically stated or mentioned, the term “audio” hereinbelow is used to refer to any one of such terms “voice” and “speech”). On the other hand, also audio received by a microphone of the near endpoint device is transmitted to the far endpoint device, and the audio is output from a speaker of the far endpoint device. As such, the audio output from the speaker of the destination device is input to the microphone of the respective far endpoint, near endpoint device. In the case that no processes are carried out thereon, since the audio is again transmitted to the destination device, such that there occur a phenomenon called “echo” in which one's own utterance is heard as echo from the speaker with some time delay. When the echo is enlarged, it is again supplied or fed back to the microphone, thereby causing howl.
As a previously developed technique such as described above, there is known an echo canceller that servers as an audio processor to prevent such echo and howl. Generally, the echo canceller cancels echo in the following manner. An adaptive filter are used to learn an impulse response between a speaker and a microphone, and the learned impulse response is convolved with a reference signal output from the speaker to thereby generate a pseudo or artificial echo (“pseudo echo,” hereinbelow). Assuming the pseudo echo to be audio output from the microphone, the echo is subtracted from the audio input from the microphone, thereby to implement echo cancellation. Generally, it is determined that the adaptive filter coefficients are more superior as filter error (difference between the microphone input signal and the pseudo echo) is smaller.
In impulse response learning such as described above, the adaptive filter coefficients become fluctuated depending on, for example, positional and environmental variations of a near endpoint talker, so that the coefficients have to be all time learned following the variations. Under these circumstances, echo cancellers of the type that controls the step size, for example, have been proposed. In this case, the step size is used to determine the adaptive filter coefficients in accordance with the output signal and power of filters that pass only specific band components. For an example of such echo cancellers, refer to Japanese Unexamined Patent Application Publication No, 2004-357053 (par. No. 0032 to No. 0041, and FIG. 1).
According to such a previously proposed example of an echo canceller, however, in an area where an audio signal-to-disturbance ratio is small, the adaptive filter coefficients are stabilized by reducing the step size. In addition, in the state where the audio signal-to-disturbance ratio is increased, and disturbance influence is small, step size control is carried out such that the step size is increased to obtain a high convergence speed.
However, the echo cancellation process such as described above has a problem. The process is effective in a single talk state or mode in which a far endpoint talker is absent in communication and only the reference signal is being output to a speaker. However, in a double talk mode both near endpoint and far end point talkers are in communication, the cancellation process can runs into the problem of causing degradation of the adaptive filter coefficients.
More specifically, in the double talk mode, together with speech of the near endpoint talker, the speaker output audio with the far endpoint talker speech being added is input into the microphone. In other words, in the adaptive filter processing, not only echo components, but also the speech of the far endpoint talker is input as disturbance into the microphone. As such, with adaptive filter coefficients set corresponding to the amount of error, there occurs the problem of setting incorrect filter coefficients causing cancellation of speech of the far endpoint talker. Consequently, the adaptive filter coefficients are degraded, thereby causing echo.
Such a problem similarly occurs in performing the step size control. Although the step size is minimized for the reason that the disturbance ratio is large in the double talk event, the result thereof leads to setting of incorrect filter coefficients. When inappropriate adaptive filter coefficients are once set, it takes time for recovery, and the output audio is not stabilized before recovery.
Accordingly, it would be desirable to provide an audio processor capable of restraining the influence of disturbance due to fluctuations in double talk or the like, providing filter coefficients steady and less corruptible, and exhibiting high speed convergence (high followability to system variations).