In the field of voice communications, a voice communication product (for example, a handset) receives a far-end signal from a network, and after the signal is played via a speaker, an echo signal is generated in the acoustic path. The echo signal and the near-end voice signal are collected by a microphone and transferred to the other end. To cancel the echo signal, an acoustic echo cancellation technology is provided in a prior art, where an adaptive filter simulates an echo path to obtain an estimated echo signal and the estimated echo signal is removed from the near-end signal collected by the microphone so that the echo is cancelled.
In the acoustic echo cancellation technology, it is necessary to detect whether the near-end signal collected by the microphone includes a near-end voice signal. Such detection is double-talk detection. In particular, it is necessary to detect whether the current voice call is in a state of double-talk where the near end and the far end are both talking or in a state of single-talk where the local-end signal includes only the echo signal so as to decide whether to update the adaptive filtering coefficient.
With respect to double-talk detection, the prior art provides an energy based detection method, a signal correlation based detection method and a double filter based detection method.
The energy based detection method compares the transient power of the near-end signal and the transient power of the far-end signal so as to detect the current talk state. This method requires that the energy of the echo signal be lower than the energy of the near-end signal and the far-end signal, and therefore is applicable to scenarios where the energy of an echo signal is low only. This method relies on the energy level of the far-end signal and the echo signal, and the error rate is high.
The signal correlation based detection method calculates the correlation between the far-end signal and the near-end signal so as to detect the current talk state. The calculation is complex and the precision depends on the extent of signal distortion. When the echo signal is distorted, the precision of detection is lower.
The double filter based detection method calculates and compares two filtering results so as to detect the current talk state. Its precision also depends on the extent of signal distortion. When the echo signal is distorted, adaptive filtering is apt to emanate so that convergence is hard to achieve, and therefore the precision of detection is lower.
During the implementation of the present invention, the inventors find that the double-talk detection methods in the prior art are applicable to scenarios where the extent of nonlinear distortion is small and the energy of the echo signal is low. In a practical environment, taking a handset as an example, because a handset speaker is characteristic of bandpass, the speaker will introduce nonlinear distortion to the echo signal and this is unavoidable. Furthermore, in hands-free mode, the energy of the echo signal is high. As a result, in practical environments, the double-talk detection methods in the prior art offer a low precision of detection and a bad performance of detection. Once a near-end signal is mistaken as an echo signal, adaptive filtering is activated to cancel the near-end voice signal as an echo signal so that the quality of voice communications is severely impacted.