In various fields of artificial intelligence, data is crucial, and many times, quality of data plays a decisive role. However, in actual situations, in most cases, quality of data is uneven, and data further needs to be processed. Generally, data processing is to remove “noise” from data, and retain actually required data. In the field of voiceprint recognition, in most cases, a voice sample of a voiceprint of a particular person obtained by using the Internet is impure, and in addition to including noise such as a voice of a nonhuman, usually, a speech of another person may be included. How to remove noise and a voice of another person by means of cleansing and only retain the voice sample of the voiceprint of the particular person is a main problem encountered at present.
Currently, to obtain a voice sample of a voiceprint of a particular person from voice data including noise and a voiceprint of another person, usually, a manual marking method is used. A specific voice sample belonging to the voiceprint of the particular person in a piece of voice data including the voiceprint of the particular person, the voiceprint of the another person, and noise is manually recognized, and a voice sample including noise and the voiceprint of the another person is manually cut off. When such a manual marking method is used to cleanse voice data, it is time-consuming and laborious, and efficiency is low.
For the foregoing problem, an effective solution is not provided yet at present.