1. Field of the Invention
The present invention relates to speaker verification and more specifically to synthetic attempts at speaker verification.
2. Introduction
Speaker and speech authentication systems are becoming more prevalent as speech recognition technology has improved and becomes available in cheaper, more reliable forms. As a biometric identification process, speech authentication systems are easy for users to interact with because there is nothing to forget or lose. Other biometric identification means exist, such as fingerprints or retinal scans, but hardware to accept such inputs are not widespread while microphones capable of receiving a speech sample are very widespread and integrated into many devices.
While using speech as a means of identification can be convenient for businesses and users, speech synthesis technology has also improved as a corollary of speech recognition. Speech synthesis technology can be used to defeat or trick speech authentication systems, lessening their effectiveness. While technology for recording someone's voice saying a particular password has been available for decades, that deceptive approach is simple enough to circumvent by requiring a different word to be spoken for speech identification so the would-be deceiver needs to not only record a speech sample, but also predict which word will be required for authentication.
Speech recognition systems may require any word to be spoken, thereby defeating the traditional attack of a pre-recorded speech library. Speech synthesis systems can replicate practically any voice, and presumably, any word or phrase. Speech recognition systems are unable to detect between the original, authentic speech and synthetic speech, potentially leading to confusion and security breaches.
Accordingly, what is needed in the art is a way of detecting speech synthesis-based attempted breaches on speech recognition and authentication systems.