1. Technical Field
The present disclosure relates to speaker verification and more specifically to preventing automated or other attacks on speaker verification systems.
2. Introduction
A current typical scenario for speaker verification is to ask the user to say, for example, his or her account number and to ask a follow-up question, such as “What is your mother's maiden name?” or “Please say your pass code.” The collected speech is used for text-dependent speaker verification (SV). One problem with this approach is that a thief can steal or surreptitiously record the user's speech for these few utterances or words. Then the thief can play back the recorded speech in order to break into the system.
In order to ensure that a live person is speaking, current Speaker verification systems usually challenge the user with a random sequence of digits or of alpha-digits. For example, the system might say, “Please say ‘7 3 1 5 6’”, or “Please say ‘N 4 5 B 7 8’”. The main problem with this approach is that the whole vocabulary for the “liveness test” is only 10 digits and 24 letters that are spoken mostly in a one-at-a-time mode, thus it is both easy to record this limited number of words (without the user's knowledge) and to play them back in any order. A thief does not need to be an expert in Text-to-Speech synthesis; a simple speech concatenation program usually is enough.
Another general problem with speaker verification systems is that the enrollment process is perceived as difficult or confusing, or users are simply unaware of the enrollment process or why they should enroll in speaker verification. These factors lead to a low adoption rate for speaker verification.