With the continuous development of the Internet and telecommunication technologies, telecommunication applications over the Internet become more numerous. Recently, great development of the Voice over IP (VOIP) technology has been achieved. However, compared with the conventional telephone technologies, the VOIP technology has disadvantage in its voice quality for a main reason of echoes.
According to the generation principles, echoes are classified into acoustic echo and electricity echo. As shown in FIG. 1, Sin represents a near-end input signal, Sout represents a near-end output signal, Rin represents a far-end input signal, and Rout represents a far-end output signal. Taking the near-end as an example, the generation principle of electricity echoes is as follows. When the near-end input signal Sin is transmitted in the Public Switched Telephone Network (PSTN), a mixing converter is needed to convert the two-line at the user end to the four-line in an exchanger. During the conversion, a part of signals are leaked from a near-end transmitting path to a near-end receiving path. This part of “leaked” signals is retransmitted to the near-end, and thereby a near-end user hears his/her own voices. This is electricity echo. Also taking the near-end as an example, the generation principle of acoustic echoes is as follows. The acoustic echo is caused by voice coupling between a voice playback device and a voice collection device. The far-end input voice signal Rin, after transmitted to the near-end, becomes the far-end output voice signal Rout. After the signal Rout is received by the near-end voice playback device, such as a speaker, and picked up by the near-end voice collection device, such as a microphone, via various reflecting paths or without any reflection, the signal Rout is retransmitted to the far-end, and thereby a far-end user hears his/her own voices. This is acoustic echo.
Generally, typical echoes with a delay of 16˜20 ms are called sidetone, which are even desired by a user because the user may feel comfortable when hearing it in a talking. However, echoes with a delay of more than 32 ms would seriously influence the quality of a talking. With the development of communication technologies, talking distance supported by the VOIP technology is becoming longer and thus voice delay increases greatly, so that echo phenomenon is becoming much more serious. Therefore, echo cancellation becomes a problem to be overcome for the VOIP technology.
Presently, electricity echo cancellation is realized by an electricity echo canceller deployed on network. Taking the near-end in FIG. 1 as an example, the operation principle of an electricity echo canceller is as follows. Because an electricity echo signal r in the near-end is generated from the near-end input signal Sin with certain delay and returned to the near-end together with the far-end input signal Rin through the far-end output signal Rout, when no voice signal is inputted in the far-end, that is, Rin is not a voice signal, an electricity echo delay M might be estimated according to the correlation of the far-end output signals Rout and Sin. Then a near-end input signal Sin(n−M) at the time earlier than the current time n by M, i.e. the time (n−M), is selected as an input signal of an adaptive filter, and an estimated electricity echo signal r′ is derived through filtering computation. Then the estimated electricity echo signal r′ is subtracted from the far-end input signal Rin, and thus the purpose of eliminating the electricity echo in the far-end output signal is achieved. During the above process, the far-end output signal Rout should be used as a correction signal in order to continuously update coefficients of the adaptive filter, so that the estimated electricity echo signal could approach an actual electricity echo signal more exactly.
The principle of acoustic echo cancellation is similar to that of electricity echo cancellation except that an acoustic echo canceller (AEC) is generally deployed in a terminal.
The related art might bring about the following problems.
1. It is difficult for the conventional electricity echo canceller to ensure the electricity echo cancellation effect at the final user end on the whole. An electricity echo canceller could only eliminate electricity echo signals on the network where it is deployed. However, because an actual network is constructed by interconnecting sub-networks based on various network technology, an electricity echo canceller deployed in a certain sub-network could only eliminate electricity echo signals on this sub-network, and could not ensure the electricity echo cancellation effect on the whole network.
2. Influences of network transmission performance on the electricity echo cancellation effect are not considered. For a voice transmission network, its transmission performance may vary at different times due to various reasons, and thereby imposing influences on echo signals, for example, distorting echo signals. The conventional electricity echo cancellation method estimates an echo delay according to the correlation of media signals only in terms of media transmission, with no concern of the problem of inaccurate estimations on echo delay caused by influences of network transmission performance on echo signals, and thereby the echo cancellation effect could not be ensured. Further, because to estimate an electricity echo delay according to the correlation of media signals incurs a large amount of calculations, an electricity echo canceller usually needs to be realized with specific chips, and it is needed to deploy electricity echo cancellers on network in a multi-point manner, and the cost is high.
3. The electricity echo cancellation effect is restricted by hardware memory. During eliminating electricity echoes, terminal input signals in a previous time period need to be saved so as to serve as reference signals for estimating electricity echo signals. Because of limited hardware memory, when transmission delay is long, a terminal input signal corresponding to the current electricity echo signal might have been discarded by the hardware memory, and thereby the electricity echo cancellation effect could not be ensured.