1. Field of the Invention
The present invention relates generally to the field of claimant authentication, and more particularly to a system and method for improving the accuracy of speaker authentication by combining the results of multiple verification sources using statistical modeling.
2. Background of the Technology
The technology and business communities are continually searching for better ways to identify and authenticate people in various contexts and for various applications. A number of methods for authentication have worked their way into the marketplace, from the mundane, such as entering a personal identification number (PIN) on a keypad, to the futuristic, such as retinal scanning. Each method tends to have its own strengths and weaknesses, running the gamut from relatively insecure to relatively foolproof, and basically easy to use to the extremely cumbersome and invasive.
As technology has advanced, and continues to advance, previously impracticable methods of identification and authentication have become routine and almost ubiquitous. One such method is the use of speech and speech related applications to provide secure and non-intrusive individual authentication. Everyone with a telephone has likely encountered some form of application that has used speech to navigate a series of menu options. What the caller may or may not know is that various aspects of the call, including their actual speech, are being analyzed to determine if the caller is who they are purporting to be. A number of systems perform a portion of this analysis by using various forms of verification technology related to or used in conjunction with the caller's speech or speech characteristics. For example, there are a number of applications available in the marketplace in which a caller “enrolls” themselves by answering a number of automated or recorded questions over the telephone. The system then creates a “voiceprint” of the enrollee and uses the voiceprint in subsequent calls to verify that the voice of the subsequent caller actually matches the voice of the enrollee. This provides a non-intrusive method for a business to validate that a caller is who they say they are, and subsequently authorize access for that person to perform a particular function or use a given application.
Although there are numerous examples of such applications, one such leading application is the use of automatic speaker authentication for password reset. Companies use a substantial amount of resources to address the issue of password reset (i.e., when a person loses or forgets their password and must contact the company to reset or choose a new password). The use of speech technology essentially makes this an automated process, drastically reducing the associated costs. But with this, and in many other applications, the company wants to ensure that the accuracy of the technology being used to authenticate the caller is state of the art and, at a minimum, effective enough to provide a high degree of confidence that the caller is an authorized user.
One concept for improving the accuracy of the ultimate authentication result is to provide a system and method that combines a number of different sources of information to provide the authentication. For example, a simpler combination system for an over-the-phone application may use two verification sources to authenticate a caller: knowledge verification and voice verification. With respect to knowledge verification, the system asks the speaker one or more questions that only the speaker (or a relatively small group of potential impostors) would know. A prime example is asking the caller for their mother's maiden name. Speech recognition technology is used to analyze the caller's utterances (small instances of speech) and determine if the caller has spoken the correct name. With respect to voice verification, the characteristics of the caller's voice are compared to the caller's voiceprint (which may have been obtained during an enrollment process) to determine if there is a match. Thus, in this particular application, the system now has two distinct pieces of information, a match or no match on the mother's maiden name, and a match or no match on the voice, with which to make an authentication decision.
The difficulty is, the underlying verification technology is not perfect; and, the method of reaching the ultimate conclusion, “is the claimant who they are purporting to be”, can be very complicated when using multiple sources of information. For example, making an accept/reject decision based upon the use of knowledge verification and voice verification is not as easy or straightforward as it may seem. In the instance of knowledge verification, if the application asks the claimant a number of questions, how is the authentication affected if the claimant gets one of the questions wrong? Even more pervasive, the underlying speech recognition technology employed by this exemplary system and method to understand what the claimant has said is not 100% accurate. How does one handle a situation in which the caller responds correctly, but the speech recognizer does not recognize the speech correctly? Or, in the instance of the voice verification, how does one consider and weigh the fact that the underlying technology performing the verification is not 100% accurate? How does one integrate these results into a single authentication decision taking into account the relative importance or weight of each particular input? And, it is obvious that, as more sources of information are added in an attempt to increase the accuracy of the authentication process, these problems become more cumbersome and almost insurmountable. The practicality of writing ad-hoc rules to govern each possible iteration and combination of variables as multiple verification sources are combined is questionable.