Audio quality has always been a big problem troubling a teleconference. In a two-party conversation process, there are generally two channels between conference terminals, wherein a downlink reception channel sends speed signals transmitted from an opposite terminal to a loudspeaker to play out after speech decoding, reconstruction and amplification, and an uplink sending channel collects the speech signals of a near-end user through a microphone and then sends same to the opposite end of a conference after speech processing and coding. In the actual work, since the two parties of the conference have a need to speak at the same time, the loudspeaker and the microphone may work at the same time, as shown in FIG. 1. In a scenario that the two parties of the conference are speaking at the same time, since the limitation of the size of the conference terminal, a far-end speech signal played by the loudspeaker may be collected by a near-end microphone and transmitted to a far-end caller through the uplink sending channel; and since transmitting a far-end speech signal back to the far-end through the uplink sending channel has a certain time delay, at the moment, the far-end caller may hear his own speech, that is, an echo, and the longer the time delay, the more obvious the echo. Therefore, the echo has the greatest impact on speech calls of the VOIP type.
In order to overcome the impact of the echo on the speed calls, at present, there are many kinds of echo elimination technologies, for example, patent application files with the Patent Application Nos. CN200610114419.5, CN200710100270.X, CN200820213203.9, CN200880104273.3, CN201010225201.3, CN201010235614.X, CN201010240571.4, CN201110048861.3, etc. has disclosed the echo elimination technologies; however, a common point of the echo elimination technologies disclosed in the above-mentioned patent application files lies in a near-end participant eliminating the echo for a far-end user, which is conducive to prevent that a far-end participant hears his own voice being taken as the echo and returned to the far-end; although near-end echo elimination may be used at a near-end unit so as to make that the far-end participant cannot hear his own voice, in some cases, a far-end unit may have no available acoustic echo canceller to eliminate a far-end echo; in this case, since the acoustic coupling between the loudspeaker and the microphone at the far-end, the near-end participant will hear that his own voice returns to the near-end. Therefore, near-end echo elimination may benefit the far-end participant, but has no effect on preventing the near-end participant hearing a near-end audio which is taken as the echo and returned from the far-end.
The patent application file with the Patent Application No. CN201010287069.9 provides a method for suppressing near-end interference returned by the far-end at the near-end. The patent application file discloses involving the detection and suppression on the returned audio at the near-end, that is, detecting and suppressing the audio which is from the near-end, acoustic coupled at the near-end and returned to the near-end unit at the near-end of the conference. In order to determine a first energy output and a second energy output of each separated frequency band from the near-end audio sent by the near-end unit and the far-end audio received by the near-end unit, the near-end unit compares the first energy output and the second energy output of each frequency band in a certain time delay range, and detects the returning of the near-end audio sent in the received far-end audio on the basis of the comparison result. The comparison can use cross-correlation to obtain an estimated time delay for a further analysis of a near-end energy and a far-end energy. The near-end unit suppresses any returned near-end audio being detected by eliminating and weakening the far-end audio output at the loudspeaker thereof. The core idea thereof is: if the speech of the far-end participant is not detected at the near-end by a double-end talk detector unit, then the volume of the near-end loudspeaker would be eliminated or weakened so as to suppress the returned near-end audio. This method really could effectively suppress the returned near-end audio, but the method cannot adaptively suppress the returned near-end audio, and cannot timely control the volume of the near-end loudspeaker; in addition, the method for suppressing the returned near-end audio is to eliminate or weaken the volume of the near-end loudspeaker, and thus could result in the discontinuity of the far-end audio, thereby reducing the speech quality of the conference, and influencing the user experience.
Therefore, in the above-mentioned relevant art, when the returned near-end audio is suppressed, the far-end audio is incontinuous because the volume of the near-end loudspeaker may not be adaptively eliminated or weakened, thus reducing the speech quality of the conference and influencing the user experience.