Voice authentication systems are becoming increasingly popular for providing secure access control. For example, voice authentication systems are currently being utilised in telephone banking systems, automated proof of identity applications, in call centres systems (e.g. deployed in banking financial services), building and office entry access systems, and the like.
Voice authentication (also commonly referred to as “verification”) is typically conducted over a telecommunications network, as a two stage process. The first stage, referred to as the enrolment stage, involves processing a sample of a user's voice by a voice authentication engine to generate an acoustic model or “voiceprint” that represents acoustic parameters unique to that their voice. The second stage, or authentication stage, involves receiving a voice sample of a user to be authenticated (or identified) over the network. Again, the voice authentication engine generates an acoustic model of the sample and compares the resultant parameters with parameters of the stored voiceprint to derive an authentication score indicating how closely matched the two samples are (and therefore the likelihood that the user is, in fact, who they are claiming to be). This score is typically expressed as a numerical value or score and involves various mathematical calculations that can vary from engine to engine.
In the case of the correct, or “legitimate”, user accessing the authentication system, the expectation is that their voiceprint (i.e. generated from their voice sample) will closely match the voiceprint previously enrolled for that user, resulting in a high score. If a fraudster (often referred to in the art as an “impostor”) is attempting to access the system using the legitimate user's information (e.g. voicing their password, etc.), the expectation is that the impostor's voiceprint will not closely match the legitimate person's enrolled voiceprint, thus resulting in a low score even though the impostor is quoting the correct information.
It is not uncommon for important acoustic characteristics of a user's enrolment sample (i.e. used to generate their voiceprint) to vary considerably from the sample subsequently provided for authentication purposes. For example, the acoustic environment when making the authentication call may be quite different to that when they initially enrolled with the system, thus resulting in a mismatch in acoustic parameters that are evaluated by the authentication engine. Other acoustic characteristic that may also vary considerably include the channel type (e.g. whether the call is made over a mobile phone network, public-switched network, IP based network or a combination thereof). Such mismatches can lead to a significant increase in both false rejection and acceptance errors being made by the authentication system.