Voice authentication systems are becoming increasingly popular for providing access control. For example, voice authentication systems are currently being utilised in telephone banking systems, automated proof of identity applications in call centres systems, automatic teller machines, building and office entry access systems, automated password reset, call back verification for highly secure internet transactions, etc.
Voice authentication is typically conducted over a telecommunications network, as a two stage process. The first stage, referred to as the enrolment stage, involves processing a sample of a person's voice presented to a voice authentication engine to generate an acoustic model or “voiceprint” that represents their unique voice characteristics. The second stage, or authentication stage, involves receiving a voice sample of a person to be authenticated (or identified) over the network. Again, the voice authentication engine generates an acoustic model of the sample and compares this with the stored voiceprint to derive an authentication score indicating how closely matched the two samples are (and therefore the likelihood that the person is, in fact, the same as that being claimed). This score is typically expressed as a numerical value and involves various mathematical calculations that can vary from engine to engine.
In the case of the correct, or “legitimate”, person accessing the authentication system, the expectation is that their voiceprint (i.e. generated from their voice file) will closely match the voiceprint previously created for that person, resulting in a high score. If a fraudster (often referred to in the art as an “impostor”) is attempting to access the system using the legitimate person's information (e.g. speaking their account number, password, etc), the expectation is that the impostor's voiceprint will not closely match the legitimate person's voiceprint, thus resulting in a low score even though the impostor is quoting the correct information.
Whether a person is subsequently deemed to be legitimate is typically dependent on a threshold set by the authentication system. To be granted access to the system, the score generated by the authentication system needs to exceed the threshold. If the threshold score is set too high then there is a risk of rejecting large numbers of legitimate persons. This is known as the false rejection rate (FRR). On the other hand, if the threshold is set too low there is a greater risk of allowing access to impostors. This is known as the false acceptance rate (FAR).
As one would appreciate, therefore, selecting an appropriate threshold for an authentication system can be difficult to achieve. On one hand the threshold setting needs to be high enough that business security requirements of the secure services utilising the authentication system are met. However, such settings can cause undue service issues with too many legitimate persons being rejected. Similarly, if the threshold is set too low, while achieving good services levels, security may be put at risk. The problem of selecting appropriate threshold settings is compounded by the fact that different authentication engines utilise different attributes or characteristics for voiceprint comparison and as a result may produce a wide range of different scores based on the same type of content provided in the voice samples (e.g. number, phrases, etc.). What is more, a single engine will also produce quite different scores for voice samples of different content types, for example an account number compared to a date of birth, or a phrase.