Automated systems for detecting and recognizing speech, e.g., words spoken in the form of audible signals or sounds and automatically recognizing speakers from the speech can be applied in a wide variety of contexts. For example, an interactive media response system (IMR) of a contact center may use automatic speaker recognition to confirm the identity of the customer that the IMR system is interacting with before providing private information to the customer (e.g., “My voice is my passport. Verify me.”). Automatic speaker recognition may also be used to distinguish between different people who share the same phone number.
Speaker recognition generally includes three aspects: speaker detection, which relates to detecting if there is a speaker in the audio; speaker identification, which relates to identifying whose voice it is; and speaker verification or authentication, which relates to verifying someone's voice. In circumstances where the set of possible speakers is closed, (e.g., the audio must be from one of a set of enrolled speakers), then speaker identification can be simplified to speaker classification. Some of the building blocks of speaker recognition systems include speaker segmentation, clustering and diarization.