In speech recognition systems, especially in long-distance speech recognition systems, such as performing speech recognition on television, because a microphone mounted on the television is closer to a loudspeaker of the television itself, that is, the distance between the microphone and the loudspeaker is closer than that between the microphone and a mouth of a person, and the sound of a program given out from the loudspeaker is generally louder than the sound of the person, so the sound of the television received by the microphone would be louder than the sound received from the person's mouth, that is, to the microphone, the sound of the television (an echo signal) would seriously interfere with the sound of people speaking and seriously affect human speech recognition by the system.
Traditionally, the television output audio signal is obtained to proceed offsetting with the television sound signal received by the microphone, so as to achieve the effect of eliminating television echo. However, due to the unevenness of frequency response and the orientation of the loudspeaker itself, and through the reflection and diffraction of the room and various objects, the echo signal picked up by the microphone and the sound signal obtained from the driving loudspeaker have been already relatively different (the difference in the degree of the attenuation or superposition of each frequency band, being reflected as further changes in the frequency response), therefore the effect of this echo eliminating is relatively limited.