With the development in technology, deep learning technologies are applied on voice to enable voice recognition and voice print recognition etc. to achieve better effects. Man-machine interaction, as a more natural interaction way, is also raised a higher requirement, especially an awakening scene, where requires the machine to “understand” an instruction sent by the user when the machine is speaking. However, the voice recognition and voice print recognition techniques, while achieving significant advances in recognition effects, raise a stringent requirement on a signal-noise ratio of the signal, requiring the maximum cancellation of the sound emitted by the machine itself to improve the signal-noise ratio.