Field of the Invention
The invention relates to the field of voice biometric systems encompassing speakers in communication networks, and particularly to the field of automated adaptation and improvement of speaker authentication accuracy in a communication network.
Discussion of the State of the Art
The field of voice biometrics has grown considerably with advances in speech recognition technology and computer processing capability. A speaker authentication system is a method for authenticating a speaker's identity using the acoustical elements of his voice. For example, an individual may wish to access his customer account using a telephone, while an enterprise handling the account may wish to ensure that only authorized individuals are able to access specific accounts. In these situations, the individual could authenticate her identity using her voice rather than (for example) inputting dual-tone multi-frequency (DTMF) digits on a telephone keypad to provide a personal identification number or PIN.
FIG. 4 illustrates a typical prior art architecture designed to support speaker authentication in a communication network. Speaker authentication system 401 performs two main functions, namely enrollment and authentication.
In the enrollment function, a speaker 410 speaks into the system through a voice interface 413 such as a telephone, microphone or other audio input mechanism. Speaker 410, whose identity is already known using other means, such as account and password through, for example, DTMF digit entry, is asked to repeat a collection of pre-configured phrases through an audio input mechanism (for example, a telephone) to be recognized by speech recognition engine 420. By analyzing various components of the speaker's voice data, enrollment processor 422 learns the speaker's voice pattern and creates a voice reference model that is then stored in speaker database 426. The same procedure would apply for each additional speaker, for example speaker 411 and speaker 412, who desire to enroll into speaker authentication system 401.
In subsequent voice interactions with the system, a speaker 411, who has previously enrolled with the system as described in the previous paragraph, can now authenticate her identity by using just her voice. Authentication interface 430 prompts the speaker to speak her account number and/or other identifying information. For example, the account number is recognized by speech recognition engine 420 and the corresponding account is accessed. Authentication processor 431 retrieves the associated voice reference model for speaker 411 from the speaker database 426. The speech pattern is then compared to the voice reference model by the comparison function 432. The comparison is checked to see whether the resulting score satisfies some threshold condition as defined by scoring threshold definition 433 to qualify as authenticated; for example, speaker authentication may only be completed when a confidence threshold of 95% is achieved. A decision on whether or not to authenticate the speaker is then made by the decision function 434.
Since an individual's voice from both the enrollment and authentication steps can often contain noise elements (including but not limited to ambient noise, additive noise resulting from the characteristics of the communication network, voice changes due to age, stress, or health, etc.) that could impede the accuracy of the speaker's true voice pattern, speaker authentication system 401 is apt to have a reduced accuracy that could result in security and usability issues by allowing false accepts (i.e. authenticating impostors), allowing false rejects (i.e. genuine speakers are rejected), or other unintended system issues. In order to mitigate these issues, a speaker authentication system must undergo regular testing and tuning to improve the authentication process by uncovering and removing security and usability issues.
In a typical voice biometric testing environment, a set of test speakers 451 use a test set of spoken account numbers, or other identifying information, of known enrolled speakers 400 to test the accuracy of the authentication system 431. In an example of one testing scenario, a test speaker 452 will speak the account number for a previously enrolled speaker 410. The authentication processor 431 uses the speech that is digitally recognized through an automatic speech recognition server 420 and retrieves the voice reference model that is associated to speaker 411 from the speaker database 426 and compared by the comparison function 432. The comparison is scored and cross-referenced to the scoring threshold definition 433. A decision on whether or not to authenticate the speaker is then made by the decision function 434. Since the testing effort knows that the speaker 452 is indeed an impostor, if the system authenticates the speaker 452 as speaker 410, then it is known that there is a security problem with the system.
The current art of testing voice biometric systems by manually creating test speaker samples and running them through the voice authentication system one-by-one would provide little improvement to the voice authentication system since creating enough test samples to thoroughly exercise the system would not be practical. Furthermore, the human labor required to create voice test samples in the current art would be incredibly expensive.
The problem with the current art is further compounded when speakers uses various communication devices and networks with a varying degrees of quality.
What is needed is the automatic creation of voice samples for testing as well as an automated way of presenting the test scenarios to the system in order to identify security and usability issues.