Traditionally, a telecommunication connection or a communication terminal device that is connected to a telecommunication network is identified by means of appropriate identifiers transmitted in various telecommunication networks. Such identifiers include, for example, CLI (calling line identification), ANI (automatic number identification), HLR (home location register), IP address, ID number, etc. The person-related differentiation among individual users, for instance, several users of one telecommunication connection, is possible, for example, by requesting a person-related identifier such as a PIN. Speech-processing systems and speaker-processing systems are likewise part of the state of the art. Whereas speech recognition systems recognize the content of the spoken utterance made, speaker recognition systems, including systems for speaker verification, speaker identification, and speaker classification, are geared towards recognizing typical speech characteristics. Since the technique of speaker recognition is similar in many ways to speech recognition, both functions are often combined with each other in one single device for many types of applications. Algorithms that can recognize a large number of speech utterances in real time employ approaches based on probability theory, whereby the mode of operation is basically broken down into a preliminary processing of an acquired speech signal in order to derive a pattern from this speech signal, into a teaching to generate a reference pattern and into a recognition using computations usually based on probability theory.
There have been systems and methods that allow a user or a calling subscriber who is connected via a communication terminal device to a telecommunication network to gain access to a desired telecommunication service, e.g., a speech service, once the speaker verification has been successfully completed. In general, after an identifier has been ascertained, speech utterances of a calling subscriber are acquired as the current voice or speech sample. Then, a pattern or biometric stored under the identifier as a reference pattern or reference biometric is read out and compared to the current voice or speech sample in order to carry out a test for a similarity specified within certain limits. For instance, a comparative value which, for example, becomes greater as the similarity of the current biometric voice or speech sample to the characteristics of the reference pattern becomes greater, can be provided as a measure of the similarity. If the measure of similarity exceeds a specified value, the biometric voice or speech sample can be acknowledged as “accepted” in this case. Generally speaking, the higher the security requirements, the narrower the so-called limit or threshold value for the measure of similarity that is defined. Such a method is described, for example, in EP 1 249 016 B1, which is hereby incorporated by reference as if set forth in its entirety.
In the description herein, the term “verification” designates a procedure that fundamentally serves to confirm the veracity of an assumed or alleged condition. The term “authentication” designates the verification of an identity about which an assumption has been made, for example, by transmitting an identifier, with a check for conformance between the actual identity and the identity for which an assumption or claim was made within the scope of the transmission of the identifier. The term “biometric sample” generally designates a usually digitized recording of at least one biometric characteristic of a person, so that, for instance, the term “voice or speech sample” designates a digitized and/or further processed recording of a speech utterance.
In a complex process, such a data record is then used to detect a structure of the at least one biometric characteristic, a so-called “characteristic structure”, for instance, a voice profile consisting of at least one digitized recording of a speech utterance, usually of a limited duration. A “biometric” is a pattern of recurring structures of biometric characteristics computed in a complex process on the basis of at least one biometric sample of exactly one person. The biometric is stored and can be used as a reference biometric to compute the similarity, particularly on the basis of specified characteristics, to at least one new biometric sample. Therefore, a “voiceprint” is a pattern of usually recurring characteristic structures of a voice generated in a complex process from at least one voice sample of a certain person, and this voiceprint is stored and can then be used as a “reference voiceprint” to compute the similarity to at least one new voice sample.
In order to obtain a biometric such as, for instance, a voiceprint, as schematically shown in FIG. 6, during the login to use the service, the user 8 or calling subscriber using a communication terminal device is generally requested in a dialog to provide one or more speech samples 12 for purposes of the teaching “T”, said samples serving to create the biometric, in other words, to create the voiceprint according to FIG. 6. Using an implemented algorithm, characteristic structures are detected 13 from the speech samples 12 during the initialization or teaching dialog “T”, and these characteristic structures then serve to compute 14a a voiceprint that is associated with the user or calling subscriber 8.
In this process, the user 8 is identified by an automatically determined identifier 9, e.g., a CLI, ANI, HLR, IP address, ID number, or else by providing an identifier 9 upon request. The ascertained voiceprint is associated with this identifier 9 and this voiceprint, as defined above, is not a copy of the biometric voice samples but rather a data record made up of the voice samples obtained during the teaching “T” and computed by means of a specific algorithm. The ascertained voiceprint 20 is usually stored in a separate data area of a memory 3, also referred to as a repository, in such a manner as to be indexed by the identifier 9. The memory 3 stores the biometrics of various users.
The voiceprint 20 thus stored can then be employed at the time of a later contact “K” of the user 8 with the system in order to perform an authentication procedure to verify the user's identity that has been assumed on the basis of the transmitted identifier 9. For this purpose, the user 8 is once again requested to provide at least one voice or speech sample 12, for example, in the form of a spoken sequence of numbers. The characteristic structure of this sample 12 is detected 13 and then compared 10 to the characteristics of the voiceprint 20 present in the system and relating to that identifier. During the comparison process, a measure of similarity 25 is ascertained that reflects the similarity of the characteristics of the obtained biometric voice sample to the reference voiceprint. If, for example, the measure of similarity 25 exceeds a previously specified limit or threshold value 11, then it is presumed, for instance, that the user who left the current voice sample is identical to the user who performed the corresponding teaching for the voiceprint that is used as the reference voiceprint for the comparison.
Moreover, as can be seen in FIG. 6, in case of a successful authentication, the biometric sample provided during the authentication process can now be used, if so desired, not only for verification/authentication but also to further adapt 14b the biometric stored for this user or calling subscriber. The biometric voiceprint 20 that was employed for the comparison and the accepted characteristic structure 18 of the currently obtained speech sample 12, is used as the biometric 14c to be adapted. The biometric voiceprint 20 is used as the basis to create a new, adapted voiceprint 14d that is then stored as the voiceprint 20.
If the reference biometric according to FIG. 6 is only provided for the utilization of one single service or for the utilization of several services having the same security requirements, and thus only for utilization with one and the same limit value 11 for the measure of similarity, it is often the case that there are no concerns about the use of such a procedure to adapt the biometric over a prolonged period of time.
However, if the user or calling subscriber 8 has to authenticate himself with a communication terminal device for several services having different security requirements and thus on the basis of different limit values 11a to 11e for the measure of similarity 25, as schematically shown in FIG. 7, then, depending on the security level, he has to teach different service-specific voiceprints 3a to 3e within the scope of several teaching dialogs or sessions “T”. On the one hand, as a function of the security level applicable in each case, these different limit values 11a to 11e for the measure of similarity 25 also apply to the adaptation of a given voiceprint 3a to 3e after a successful authentication. Otherwise, when only one single voiceprint is employed per user, the incorporation of characteristic patterns 13 with a worse/lower measure of similarity 25 detected from speech samples 12, in the case of a limit value with worse/lower requirements for services that are less security-relevant, will adapt the quality of the voiceprint on hand, over the course of time, to services having the lowest security requirements.
The voiceprints 3a to 3e refer to different security levels as shown accordingly in FIG. 7. On the other hand, in the case of different services having different security requirements, it is usually necessary to have a separate data area for each service or service group having the same security level, which also means that a separate teaching cycle “T” has to be carried out for each service or service group having the same security requirements in order to generate the voiceprints 3a to 3e for the same user who then has to undergo this procedure for each service, that is to say, several times.