Considerable attention has been given in recent years to text-dependent, fixed-password speaker verification systems for network-based applications. An example of such a system is one in which customers are assigned unique digit string password utterances which are used for both identity claim and verification.
In such previous system the digit string password utterance is decoded by a speaker-independent, connected-digit recognizer to find a matching, valid, customer account number. The utterance is then compared with models for the customer and a matching score is generated. This score is compared with a rejection threshold to decide whether or not to authenticate the claim. The combined system, digit recognizer and speaker verifier, can be considered an (open-set) speaker identification system.
In speaker identification a speech sample from an unknown speaker is processed and associated with the customer with the best matching models. "Open-set" refers to the possibility that the speech sample may be provided by a speaker outside the customer set (e.g., an imposter). Thus, the quality of the best match must be assessed to determine whether or not the match is valid. This constitutes verification or authentication.
Speaker identification is, in general, more difficult than speaker verification. Whereas in speaker verification the speech is compared with just one customer model, in speaker identification, in general, the sample must be compared with every customer's model, so that processing and error rate increase as the size of the customer population increases.
What makes a digit password system practical is that each customer is assigned a unique account number for a password, so that the digit recognizer is able to propose an identity claim reasonably accurately and efficiently. After it has been determined that the decoded digit string is a valid account number, the speech input is compared only with the models associated with that account.
Customers may find it more convenient and comfortable to use familiar phrases in place of digit strings for passwords one of the problems in moving from digit string passwords to general text passwords is that while digit string passwords can be represented as concatenations of whole-word digit units, general text passwords, drawn from an arbitrary size vocabulary, must be represented as concatenations of smaller units such as phones. Since phone units are smaller and more prone to segmentation and recognition errors than digit units, and are more numerous and more confusable, it is not unreasonable to expect that some degradation in performance will occur. Since decoding a password phrase into a string of phones is substantially more prone to error than decoding a digit string password into a string of digits, a lexicon containing phone transcriptions of user password phrases can be used to constrain the recognition and control such errors.