The present invention relates to a voice authentication method and a system utilizing same and, more particularly, to a system and method which compare a voice print of a user with each of a plurality of stored voice prints of known individuals, and provide authentication only if the user voice print is most similar to a stored voice print of an individual the user claims to be among all other stored voice prints.
The use of various types of methods to secure systems from unauthorized access is common practice in financial institutions, banks, electronic commerce Internet sites, computer networks and the like.
Currently available physical authentication devices which are frequently used to access secure systems, such as crypto cards, limited access cards, or keys, provide low security protection, since such devices can be lost, stolen, loaned to an unauthorized individual and/or duplicated.
Another and more sophisticated approach for authentication, which is used to provide higher security protection, is known in the art as biometric authentication. Biometric authentication involves identification of unique body characteristics, such as, fingerprints, retinal scans, facial recognition and voice pattern authentication.
Retinal scanning is based on the fact that retinal blood vessel patterns are unique and do not change over lifetime. Although this feature provides high degree of security, retinal scanning has limitations since it is expensive and requires complicated hardware and software for implementation.
Finger printing and facial recognition also requires expensive and complicated hardware and software for implementation.
Voice verification, which is also known as voice authentication, voice pattern authentication, speaker identity verification and voice print, is used to provide a speaker""s identification. Voice pattern authentication differs from voice pattern recognition. In voice pattern recognition, or speech recognition the speaker utters a phrase (e.g., a word, such as a password) and the system determines the spoken word b) selecting from a pre-defined vocabulary. Therefore voice recognition provides for the ability to recognize a spoken phrase and not the identity of the speaker.
The terms voice verification and voice authentication are interchangeably used hereinbelow. Techniques of voice verification have been extensively described in U.S. Pat. Nos. 5,502,759; 5,499,288; 5,414,755; 5,365,574; 5,297,194; 5,216,720; 5,142,565; 5,127,043; 5,054,083; 5,023,901; 4,468,204 and 4,100,370, all of which are incorporated by reference as if fully set forth herein. These patents describe numerous methods for voice verification.
Voice authentication seeks to identify the speaker based solely on the spoken utterance. For example, a speaker""s presumed identity may be verified using a feature extraction and pattern matching algorithms, wherein pattern matching is performed between features of a digitized incoming voice print and those of previously stored reference samples. Features used for speech processing involve, for example, pitch frequency, power spectrum values, spectrum coefficients and linear predictive coding, see B. S. Atal (1976) Automatic recognition of speakers from their voice. Proc. IEEE, Vol. 64, pp. 460-475, which is incorporated by reference as if fully set forth herein.
Alternative techniques for voice authentication include, but are not limited to, neural network processing, comparison of a voice pattern with a reference set, password verification using selectively adjustable signal thresholds, and simultaneous voice recognition and verification.
State-of-the-art feature classification techniques are described in S. Furui (1991) Speaker dependent-feature extraction, recognition and processing techniques. Speech communications, Vol. 10, pp. 505-520, which is incorporated by reference as if fully set forth herein.
Text-dependent speaker recognition methods rely on analysis of predetermined utterance, whereas text-independent methods do not rely on any specific spoken text. In both case, however, a classifier produces the speaker""s representing metrics which is thereafter compared with a preselected threshold. If the speaker""s representing metrics falls below the threshold the speaker identity is confirmed and if not, the speaker is declared an impostor.
The relatively low performance of voice verification technology has been one main reason for its cautious entry into the marketplace. The xe2x80x9cEqual Error Ratexe2x80x9d (EER) is a calculation algorithm which involves two parameters: false acceptance (wrong access grant) and false rejection (allowed access denial), both varying according the degree of secured access required, however, as shown below, exhibit a tradeoff therebetween. State-of-the-art voice verification algorithms (either text-dependent or text-independent) have EER values of about 2%.
By varying the threshold for false rejection errors, false acceptance errors are changing as graphically depicted in FIG. 1 of J. Guavain, L. Lamel and B. Prouts (March, 1995) LIMSI 1995 scientific report, which is incorporated by reference as if fully set forth herein. In this Figure presented are five plots which correlate between false rejection rates (abscissa) and the resulting false acceptance rates for voice verification algorithms characterized by EER values of 9.0%, 8.3%, 5.1%, 4.4% and 3.5%. As mentioned above, there is a tradeoff between false rejection and false acceptance rates, which renders all plots hyperbolic, wherein plots associated with lower EER values fall closer to the axes.
Thus, by setting the system for too low false rejection rate, the rate of false acceptance becomes too high and vice versa.
Various techniques for voice-based security systems are described in U.S. Pat. Nos. 5,265,191; 5,245,694; 4,864,642; 4,865,072; 4,821,027; 4,797,672; 4,590,604; 4,534,056; 4,020,285; 4,013,837; 3,991,271; all of which are incorporated by reference as if fully set forth herein. These patents describe implementation of various voice-security systems for different applications, such as telephone networks, computer networks, cars and elevators.
However, none of these techniques provides the required level of performance, since when a low rate of false rejection is set, the rate of false acceptance becomes unacceptably high and vice versa.
To try and overcome the above mentioned limitation of prior art systems, U.S. Pat. No. 5,913,196 to the present inventors, describes a computerized system which includes at least two voice authentication algorithms. Each of the voice authentication algorithms is different from the others and serves for independently analyzing a voice of the speaker for obtaining an independent positive or negative authentication of the voice by each of the algorithms. If every one of the algorithms provide positive authentication, the speaker is positively identified, whereas, if at least one of the algorithms provides negative authentication, the speaker is negatively identified.
Although the authentication system and method described in U.S. Pat. No. 5,913,196 is considerably more accurate than other prior art voice authentication systems, it still suffers from limitations common to prior art systems, which limitations arise from signal distortion (due to, for example, channel mismatch), user error and random background noise.
There is thus a widely recognized need for, and it would be highly advantageous to have, a voice authentication system and method for authorizing or denying a user access to a secure site, which system and method are devoid of the above limitations.
According to one aspect of the present invention there is provided a system for authorizing a user access to a secure site, the system comprising (a) a memory unit being for storing information including a stored voice print and an identity of each of a plurality of individuals having access to the secured site, the stored voice print of each of the plurality of individuals being generated from a corresponding voice data thereof; (b) a first input device being for inputting user information, the user information being for verifying that a user identifies him- or herself as a specific individual among the plurality of individuals; (c) a second input device being for inputting temporary voice data of the user; (d) a first processing unit being for generating a temporary voice print of the user from the temporary voice data received from the second input device; and (e) a second processing unit being for comparing the temporary voice print received from the first processing unit to the stored voice print of each of at least a portion of the plurality of individuals, at least the portion of the plurality of individuals including the specific individual, such that the user is granted access to the secure site only if the temporary voice print is most similar to the stored voice print of the specific individual.
According to another aspect of the present invention there is provided a method of authorizing a user access to a secure site, the method comprising the steps of (a) providing a memory unit being for storing information including a stored voice print and an identity of each of a plurality of individuals, the stored voice print of each of the plurality of individuals being generated from corresponding voice data thereof; (b) collecting user information provided by a user, the user information being for verifying that the user identifies him- or herself as a specific individual among the plurality of individuals; (c) processing temporary voice data collected from the user into a temporary voice print; (d) comparing the temporary voice print with the stored voice print of each of at least a portion of the plurality of individuals, at least the portion of the plurality of individuals including the specific individual; and (e) granting the user with access to the secure site only if the temporary voice print is most similar to the stored voice print of the specific individual.
According to further features in preferred embodiments of the invention described below, the first input device is selected from the group consisting of a keypad and a microphone, thus, the user information is provided via an in put device selected from the group consisting of a keypad and a microphone.
According to still further features in the described preferred embodiments the first input device communicates with the first processing unit via a communication mode selected from the group consisting of telephone communication, cellular telephone communication, computer network communication and radiofrequency communication, thus, the user information is provided via an input device selected from the group consisting of a telephone, a cellular telephone, a computer and radiofrequency communication device.
According to still further features in the described preferred embodiments the second input device includes a microphone, thus, the temporary voice data is collected by a microphone.
According to still further features in the described preferred embodiments the second input device communicates with the first processing unit via a communication mode selected from the group consisting of telephone communication, cellular telephone communication, computer network communication and radiofrequency communication, thus, the temporary voice data is collected by an input device selected from the group consisting of a telephone, a cellular telephone, a computer and radiofrequency communication device.
According to still further features in the described preferred embodiments the first input device and the second input device are integrated into a single input device, whereas the single input device includes a microphone, thus, the user information and the temporary voice data are collected by a single input device, a microphone.
According to still further features in the described preferred embodiments the temporary voice data includes the user information.
According to still further features in the described preferred embodiments the first processing unit and the second processing unit are integrated into a single processing unit, thus, steps (c) and (d) are effected by a single processing unit.
According to still further features in the described preferred embodiments the stored voice print of each of the plurality of individuals has been generated by the first processing unit.
According to still further features in the described preferred embodiments comparing the temporary voice print received from the first processing unit to the stored voice print of each of at least the portion of the plurality of individuals is effected by a voice authentication algorithm selected from the group consisting of a text-dependent and a text independent voice authentication algorithms.
According to still further features in the described preferred embodiments the voice authentication algorithm is selected from the group consisting of feature extraction followed by pattern matching, a neural network algorithm, a dynamic time warping algorithm, the hidden Markov model algorithm and a vector quantization algorithm.
According to still further features in the described preferred embodiments the first processing unit processes the user information so as to validate that the user identifies him- or herself as a specific individual of the plurality of individuals prior to generating the temporary voice print.
According to still further features in the described preferred embodiments the plurality of individuals includes at least 10 individuals.
According to still further features in the described preferred embodiments the corresponding voice data of each of the plurality of individuals includes a plurality of independent voice data inputs.
According to still further features in the described preferred embodiments the stored voice print of each of the plurality of individuals is generated from at least one of the plurality of independent voice data inputs.
According to still further features in the described preferred embodiments access is granted if a distortion level between the temporary voice print and the most similar stored voice print of the specific individual is less than a distortion level between the temporary voice print and the stored voice print of all other individuals of at least the portion of the plurality of individuals, thus, step (c) of the method is effected by comparing a distortion level between the temporary voice print and the stored voice print of each of at least the portion of the plurality of individuals.
According to still further features in the described preferred embodiments the first processing unit also extracts at least one voice feature from the temporary voice data.
According to still further features in the described preferred embodiments the secure site is selected from the group consisting of a virtual site, and a physical site.
According to still further features in the described preferred embodiments the virtual site is a World Wide Web site.
The present invention successfully addresses the shortcomings of the presently known configurations by providing a system and a method which compare a voice print of a user with each of a plurality of stored voice prints of known individuals, and provide authentication only if the user voice print is most similar to a stored voice print of in individual the user claims to be among all other stored voice prints.
Implementation of the method and system of the present invention involves performing or completing selected tasks or steps manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of preferred embodiments of the method and system of the present invention, several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof. For example, as hardware, selected steps of the invention could be implemented as a chip or a circuit. As software, selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In any case, selected steps of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.