When a voice processing device acquires or inputs a voice signal, interference from various types of noise inevitably exists. In an actual voice communications system, common noise includes stationary noise and a directional interference sound source. Such noise easily causes interference to a target sound signal, and severely reduces acoustic comfort and speech intelligibility of acquired sound. Effects of a conventional noise estimation algorithm and a conventional single-channel voice quality enhancement algorithm in suppressing directional interference noise are very unsatisfactory. Therefore, some systems having an interference noise suppression capability need to be designed according to an actual situation, to directionally pick up a target voice, and implement a capability of suppressing other noise.
Most of existing sound source positioning algorithms use a beamforming technology, a sound source positioning technology based on a delay difference, and the like, to position a sound source direction in a sound field, and then use a fixed beam or an adaptive beam, to reduce an interference sound source beyond a beam, and implement directional sound pickup.
Based on a photographing scenario of a terminal, a user uses a camera of the terminal to perform photographing. In an existing sound source positioning technology based on a delay difference, in a scenario of a low signal-to-noise ratio, direction information of a target sound source (a sound source in a direction the same as a photographing direction of a camera) is often mixed with direction information of a noise source (a sound source in a direction opposite to the photographing direction of the camera). Therefore, during video capturing, a lot of noise exists, leading to low pickup precision of the target sound source. Consequently, a lot of noise still exists in final captured content.