The background description provided herein is for generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art or suggestions of the prior art, by inclusion in this section.
Speaker recognition is the process of identifying or verifying a person based, e.g., on the voice biometrics of the person, and the process may also be called as voice recognition or speaker verification. Speaker recognition typically has two phases, an enrollment phase, and a verification phase. During the enrollment phase, the speaker's voice is recorded and analyzed. Subsequently a speaker model (or voiceprint, template) may be built to characterize the voice biometrics of the person. During verification, a speech sample (or utterance) may be compared against one or more previously created speaker models. As an example, the speech sample may be compared against multiple speaker models for identification purpose. As another example, the speech sample may be compared against one speaker model for verification purpose with a presumptive identification of the speaker.
Speaker recognition systems generally fall into two categories: text-dependent speaker verification (TD-SV) and text-independent speaker verification (TI-SV). TD-SV generally requires the speaker utter the same text for enrollment and verification. Compared to TD-SV, TI-SV systems generally require very little if any cooperation by the speaker because there is no constraint on the speech content, and the speaker may speak freely to a TI-SV system. Advantageously, for TI-SV systems, the text used during enrollment and verification can be different. However, TI-SV systems generally require a long enrollment session lasting at least several minutes to achieve reasonably acceptable error rate at verification sessions. Requiring the users to explicitly read or talk for long time for the sole purpose of enabling voice biometrics enrollment may lead to poor user experience in general. Furthermore, explicit enrollment may be unsuitable for cases where the enrollment should happen without the user's knowledge, such as in forensic applications or stealthy operations.