With advances in speech processing techniques, automatic user-machine interaction systems and services are becoming common across different fields. Speaker verification techniques are now employed as security measures in many computer systems. A Speaker Verification (SV) system operates to verify the identity of a user speaking a known voice pass-phrase.
A simple and well-known method for attacking such a system is a splicing method (splice attack), in which attackers collect different voice recordings from the target user. From those recordings the attackers selectively cut out the words of the pass-phrase and paste the words together (this is known as word splicing). The attackers then play this spliced sample to the SV system. This method is known to have a very high likelihood of deceiving speaker verification systems.
Currently there are no known methods for detecting splicing attacks. In order to make it more difficult for an attacker to use the splicing method, SV systems may use, for example, random pass-phrases. The accuracy of the SV for a random pass-phrase, however, is not as good as for a global or speaker-specific pass-phrase. Furthermore, even random pass-phrases may be spliced on the fly.
Another known approach for mitigating splice attacks requires a combination of a voice sample with at least one other type of biometric identification, such as face, fingerprint, or signature identification. This approach is less convenient for the users and requires additional tools and procedures to capture the additional biometrics. Furthermore, since “the chain is only as strong as its weakest link,” this approach is less than ideal.