Automatic speaker recognition aims to recognize people from their voices. Two standards of speaker recognition are verification and identification. The former is a 1-to-1 problem, in which the claimed identity of one speaker is verified based on previous samples provided by that speaker alone to verify an alleged identity of the speaker. The latter (identification) is a 1-to-N problem in which a speaker is evaluated against previously obtained models (also known as voiceprints or embedments) of multiple speakers in order to identify which speaker out of the N known speakers produced a particular received test sample.
Speaker identification (the 1-to-N problem) can be addressed in closed-set or open-set conditions. Closed-set identification results in identifying a speaker among known speakers. In contrast, open-set identification considers not only whether the speaker is one of the N speakers, but also if it is an unknown speaker. Both the verification and identification scenarios are important in a call center.
Speaker verification is used to authenticate a genuine known user by comparing his current voice with the voice on file for that user, while speaker identification can be used to detect a potential fraudster whose voice exists in a blacklist of the call center.
In some implementations, speaker recognition is text-dependent—which usually requires active enrollment of a speaker by asking the speaker to repeat a prompted passphrase—or text-independent, which is typical of passive recognition.
A less widely known use case of speaker recognition technology is anomaly detection, where the objective of the recognition is to detect possible fraud activities by using an incoherence between (a) voiceprint similarities and (b) available metadata such as automatic number identification (ANI) or account number (e.g., a caller's account at a bank serviced by the call center).
The inventors have devised several methods to improve call center operations using speaker recognition in novel ways.