Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech signals. It can be divided into speaker identification and speaker verification. Speaker identification determines which registered speaker provides a given utterance from amongst a set of known speakers. Speaker verification accepts or rejects the identity claim of a speaker to determine if they are who they say they are. Speaker verification can be used to control access to restricted services, for example, phone access to banking, database services, shopping or voice mail, and access to secure equipment.
Speaker verification systems typically use voice biometrics to verify that a given speaker is who they say they are. Voice biometrics is used by digitizing a profile of a person's speech to produce a stored model voice print, or template. Biometric technology typically reduces each spoken word to segments composed of several dominant frequencies called formants. Each segment has several tones that can be captured in a digital format. The tones collectively identify the speaker's unique voice print. Voice prints are stored in databases in a manner similar to the storing of fingerprints or other biometric data.
In both speaker identification and speaker verification, an enrollment session is often required for the system to collect speaker-specific training data to build speaker models. Enrollment is the procedure of obtaining a voice sample. To ensure a good-quality voice sample for speaker verification, a person usually recites some sort of text or pass phrase, which can be either a verbal phrase or a series of numbers. The text or phrase may be repeated several times before the sample is analyzed and accepted as a template or model in the database. When a person speaks the assigned pass phrase, voice features are extracted and compared with the stored template or model for that individual. When a user attempts to gain access to the system, his or her pass phrase is compared with the previously stored voice model.
Voice verification systems can be text dependent, text independent, or a combination of the two. Text dependent systems require a person to speak a predetermined word or phrase. This information, known as a “pass phrase,” can be a piece of information such as a name, birth city, favorite color or a sequence of numbers. Text independent systems recognize a speaker without requiring a predefined pass phrase. Text independent systems typically operate on speech inputs of longer duration so that there is a greater opportunity to identify distinctive vocal characteristics (i.e., pitch, cadence, tone).