1. Field of the Invention
The present invention is directed to access control systems, and, more particularly, to access control based on voiceprint identification.
2. Description of the Related Art
Access control systems are used to prevent unauthorized users from gaining access to protected resources, such as computers, buildings, automatic teller machines (ATMs), credit cards and voicemail systems. When a user attempts to access a protected resource, a typical access control system engages in one or more interactions with the user, such as prompting the user and requiring him to enter an identity of an authorized person (the user""s purported identity) and a valid passcode (sometimes called a personal identification number or PIN). For example, a typical voicemail system requires a user to first enter his mailbox number and then a passcode by pressing keys on his telephone. Only if the entered passcode matches the passcode associated with the entered mailbox number is the user deemed to be a subscriber and allowed to further interact with the system, i.e. to access a restricted resource, a mailbox in this case and to retrieve messages or send messages to access a restricted resource, a mailbox in this case, and to other subscribers.
The advantages of using a voiceprint system over standard PIN number systems are several. First, it is quicker and more convenient to speak instead of having to punch codes into a numeric keypad. Also, if the user is required to enter his PIN number by pressing telephone keys, if she is not using a touch-tone phone or if she is using a phone that does not allow tone codes such as a cellular or cordless phone, then this would be impossible. Also a voiceprint system is more secure, as unlike the standard PIN number system, even if an impostor obtains a subscriber""s passcode her voiceprint will not match or allow her to gain access.
Voiceprint access control systems use discriminating characteristics of each authorized person""s voice to ascertain whether a user is authorized to access a protected resource. The sound of a speaker""s voice is influenced by, among other things, the speaker""s physical characters, including such articulators as the tongue, vocal tract size, and speech production manner, such as place and rate of articulation. When a user attempts to access a protected resource, a typical voiceprint system samples an utterance produced by the user and then compares the voiceprint of the utterance to a previously stored voiceprint of the authorized person, whom the user purports to be.
Voiceprint systems must be trained to recognize and differentiate each authorized person through her voice. This training involves sampling each authorized person""s voice while she utters a predetermined word or phrase and then processing this speech sample to calculate a set of numeric parameters (commonly called the acoustic features in a xe2x80x9cvoice templatexe2x80x9d of the speaker""s voice). This voice template is stored, along with other voice templates, in a database that is indexed (sometimes known as keyed) by the identity of the speaker.
The parameters of a voice template quantify certain biometric characteristics of the speaker""s voice, such as amplitude, frequency spectrum and timing, while the speaker utters the predetermined word or phrase. A speaker""s voice template is fairly unique, although not as unique as some other characteristics of the speaker, such as the speaker""s fingerprint. For example, identical twins are likely to have nearly indistinguishable voices, because their vocal tracts are similarly shaped.
When a user attempts to gain access to a protected resource, the user enters his purported identity, and then a conventional voiceprint identification system uses this identity to index into the database and retrieves a single voice template, namely the voice template of the authorized person who the user purports to be. The system prompts the user to speak a predetermined word or phrase and samples the user""s voice to create a voice template from the user""s utterance. The system then compares the user""s voice template to the authorized person""s voice template using one or more well-known statistical decision-theoretic techniques. This comparison produces a binary (match/no match) result. If the two voice templates are sufficiently similar, the voice templates are said to match (as that term is used hereinafter) and the user is deemed to be the authorized person, otherwise the voice templates are said to not match and the user is deemed to be an impostor.
The statistical decision associated with hypothesis testing (match/no-match) is characterized by two types of errors: false rejection (Type I errors) and false acceptance (Type II errors). The algorithms used in the comparison are typically adjusted so that the likelihood of Type I errors is approximately equal to the likelihood of Type II errors.
There are two kinds of speech recognition technology packages. The first is Speaker-Independent (SI) speech recognition technology, which can recognize words and does not require training by the individual user. The disadvantage of SI technology is the active vocabulary of words it can generally recognize is limited to reduce errors and calculation time. The second type of speech recognition technology is Speaker-Dependent (SD) technology, which requires training of each word by each individual user but has significantly higher accuracy for the user.
Speaker Independent recognition focuses on common acoustic features of a sound, and attempts to match many instantiations of an utterance with one, common xe2x80x9cprototypexe2x80x9d of that utterance (many-to-one mapping). recognition focuses on acoustically differentiating the (possibly different) features so that one pattern can be selected from many similar patterns (one-from-many mapping). This is the xe2x80x9csubscriberxe2x80x9d, as described in this invention.
Prior art for voiceprint identification focuses SI technology is limited to confidently recognizing the unique acoustic pattern of the subscriber (one-from-one mapping). The invention permits more flexibility by combining the two ASR technologies so that SI ASR is used to identify a subset of subscribers (a cohort), and SD ASR is used to verify a particular member of the cohort (the subscriber).
As commonly used in the art, xe2x80x9cidentificationxe2x80x9d means ascertaining a user""s purported identity, and xe2x80x9cverificationxe2x80x9d means ascertaining if the user""s voice matches the voice of the specified, e.g. identified, speaker.
Some prior art voiceprint identification systems assign a unique spoken passcode to each user, such as a random number or the user""s social security number. Because each user has his own passcode, the voiceprint identification system can readily access the user""s account once it identifies the user from his passcode.
Requiring each authorized person""s password to be unique poses problems. For example, authorized persons cannot readily choose or change their passwords. Additionally, using preassigned numbers, such as a user""s social security or telephone number, may pose security problems.
What is needed, therefore, is a voiceprint identification system that identifies and verifies a user from a single utterance, but that permits multiple authorized persons to have identical passwords. A system that simply derives a voice template of the user""s voice sample and then exhaustively searches an entire database for a matching voice template would be slow, because this database stores a large quantity of data, associated with each and every authorized person. Furthermore, such a system would produce an unacceptably high rate of Type I or Type II errors. As the number of valid templates increases, the user""s voice template is increasingly likely to be closer to one of the valid voice templates. Adjusting the comparison algorithms to reduce the likelihood of Type II (false acceptance) errors would raise the likelihood of Type I (false rejection) errors to an unacceptably high value.
xe2x80x9cObjectives of the invention include implementing more secure, efficient and user friendly voice identification systems.xe2x80x9d
The above objectives can be attained by a system that identifies and verifies a user from voice data collected from a single utterance by the user. At least two signal processors process the voice data, and each signal processor operates with different selection criterion. These selection criteria are used together to select at most one matching record of an authorized person from a database of authorized persons. Each individual selection criterion optimally partitions the database into two subsets of records: a subset of selected records and subset of non-selected records. The matching record is defined as the intersection of the subsets of selected records. If a single matching record is selected, the user is deemed to be identified and verified as the authorized person who corresponds to the matching record. On the other hand, if no matching record is selected, the user is deemed not to be an authorized person.
These together with other objectives and advantages, which will be subsequently apparent, reside in the details of construction and operation as more fully hereinafter described and claimed, reference being had to the accompanying drawings forming a part hereof.