Recently, as commercial transactions and service use via networks have prevailed, attention has been given to biometric authentication for authenticating personal identification by utilizing information peculiar to an individual such as physical characteristics instead of a keyword or a personal identification number, as a method for protection against “disguise”. One of examples of such biometric authentication is voice authentication.
Further, as one of the voice authentication methods, a so-called text-dependent voice authentication method is known, which is a method in which personal identification is authenticated by utilizing the same speech content at the times of registration and authentication. The text-dependent voice authentication utilizes, at the time of authentication, a speech content that was registered at the time of registration. This method also is called “password method”, “keyword method”, or “key phrase method”. In the text-dependent voice authentication, a voice speaking a preliminarily determined speech content (keyword) or a speech content that an authorized user has determined freely for him/herself is registered in a system. When a person who intends to be authenticated inputs the keyword with voice, the personal identification is authenticated depending on whether or not characteristics of the input voice match the characteristics of the voice registered as the voice of the foregoing person. Here, in some cases, whether or not the speech content of the input voice matches the registered speech content is checked.
For instance, as a conventional example of the text-dependent voice authentication, JP-2002-304379-A discloses a system for personal authentication configured as follows. Regarding each of persons who intend to be authenticated, a plurality of words and a set of voiceprint data obtained when making the foregoing person utter the foregoing words are stored in a memory medium in advance. Any one of the plurality of words corresponding to ID data inputted by a person who intends to be authenticated and voiceprint data corresponding to it are selected, and the word is presented before the person so as to urge him/her to utter it. His/her voiceprint is analyzed and is matched with the voiceprint data preliminarily stored. By so doing, personal authentication of the person who intends to be authenticated is performed.
Here, the conventional voice authentication method is described with reference to FIGS. 5 and 6.
FIG. 5 illustrates a schematic configuration of a conventional voice registration system for generating and registering standard templates for voice authentication. As shown in FIG. 5, in the conventional voice registration system 104, when a user 101 registered as an authorized user utters an input voice toward a voice input part 106, and a feature extraction part 107 converts the input voice into a time series of feature parameters, and a standard template generation part 108 generates a standard template. The generated standard template is stored in a standard template storage part 109, in a state of being associated with personal identification data (ID) (#M in the example shown in FIG. 5) that is allocated to the user.
FIG. 6 illustrates a schematic configuration of a conventional voice authentication system that utilizes registered voices of authorized users that have been registered by the voice registration system shown in FIG. 5. A standard template storage part 207 in the foregoing authentication system 204 stores data of the standard template storage part 109 of the voice registration system 104, i.e., standard templates associated respectively with personal IDs of authorized users.
As shown in FIG. 6, at the time of authentication, a user 201 intending to be authenticated inputs a personal ID (#M in the example shown in FIG. 6) in a personal ID input part 205 of the authentication system 204, and inputs a voice in a voice input part 208. The voice thus inputted is converted into a time series of feature parameters by a feature extraction part 209. The personal ID thus inputted is sent to a standard template selection part 206. The standard template selection part 206 selects a standard template corresponding to the personal ID thus inputted from the standard template storage part 207, and sends the same to a similarity calculating part 210.
The similarity calculation part 210 calculates a similarity of the time series of the feature parameters obtained by the feature extraction part 209 to the standard template selected by the standard template selection part 206. A determination part 211 compares the calculated similarity with a preliminarily set threshold value, so as to determine whether to accept the user 201 by confirming his/her identity or to reject the user 201 as another person, and outputs the determination result.
In the case where the text-dependent voice authentication method is adopted, at the time of registering a voice to the voice registration system 104, either the voice registration system or each user determines a keyword and the user utters the keyword so that it is registered. Then, at the time of authentication, the user utters the keyword that he/she memorizes, so as to be authenticated.
However, in the case where time has passed since the voice registration was carried out to the voice registration system until the authentication system is utilized actually, a change of utterance possibly has occurred even to a registered person him/herself. The change of utterance refers to changes in information such as a pitch frequency, an intonation, a power, a speaking rate, a spectrum, etc. of a voice. The change of utterance causes the similarity calculated by the similarity calculation part 210 to decrease, and consequently it frequently happens that a person who should correctly be identified is determined as another person falsely. Thus, the authentication precision deteriorates because utterance of a keyword at the time of authentication has been transformed as compared with the utterance of the keyword at the time of registration, and this problem has been a pending problem of the text-dependent voice authentication method since before.