1. Field of the Invention
The present invention relates generally to the field of speaker authentication, and more particularly to a system and method for generating on-demand voiceprints using voice data descriptions.
2. Background of the Technology
Authentication systems and methods are used in almost every commercial transaction these days. These methods can be as simple as a manual signature verification or photo identification to the more complicated biometric analyses which authenticate an individual using one or more aspects of that individual's biometric characteristics. A prevalent example of such a biometric system in today's commercial environment is a system which uses the characteristics of a persons speech to make an authentication decision (is this person or is this person not who they are claiming to be).
In a typical interactive voice response (IVR) application, a speaker will call into a system to access the system over the phone. The speaker will be asked a number of questions or given a number of options in order to navigate the application. As part of this process, the system will authenticate the speaker by comparing the speaker's real-time speech with a copy of the speaker's voiceprint (a model of the speaker's voice). Traditionally, this voiceprint is constructed in one or more enrollment sessions during which the speaker answers a series of questions. The characteristics of the speaker's speech are then used by the system to build the voiceprint associated with that person. This voiceprint is stored by the system for future use.
This method of creating and storing the voiceprint during an enrollment process for future use is effective, and relatively straightforward to implement. Typically, a byproduct of this process is that the actual voice recordings used to construct the voiceprint are discarded (or not generally available for on-demand use). Though generally effective, this method suffers from a few fundamental drawbacks. In particular, this method, and the systems or applications that rely on it, is not technologically independent. That is, there is no standard format or encoding for voiceprints. This means that each voiceprint that is created by a system or for a particular application is integrally tied to the speech technology (“speech engine”) used to create that voiceprint. Indeed, the voiceprint may even be integrally tied to a particular version of the speech engine used to create it. The ultimate result is that a particular voiceprint can not be used with a different speech engine, or perhaps not even with a different version of the same speech engine. This leads to significant difficulty for a system or application provider when switching speech engines, or more importantly, upgrading an existing speech engine. No one wants to go through an enrollment process to create a useable voiceprint with each and every speech engine upgrade.
Another difficulty with current systems is their inability to handle changes in a speaker's voice over time. Although some of the more sophisticated systems attempt to mitigate these changes by adapting their voiceprints with new data, the old data still remains a significant part of the model. A further problem with the current systems and methods is the inability of multiple authentication applications to easily share voiceprints. For example, a first application may require voiceprints that contain utterance “a,” “b,” and “c,’ and a second application may require voiceprints that contain utterance “b,” “c,” and “d.” Even though the underlying speech engine may be the same, a person would have to enroll two times, once for each application. This, or a combination of the above problems, can lead to substantial user frustration.