Speaker recognition relates to recognizing a speaker by the characteristics of his or her voice. There are typically two phases in speaker recognition: in a first step a speaker whose identity is typically known is enrolled, and a voice print of the speaker is stored. In a second step, speaker recognition takes place, i.e. characteristics of a speaker's voice are extracted (derived) from a speaker recognition testing signal, which is an audio signal, and compared to one or more stored voice prints.
Speaker recognition may refer to speaker verification, i.e. confirming or refusing that a person who is speaking is who he or she says to be or is assumed to be. In speaker verification, a voice print of one speaker is compared to information based on a speaker recognition testing signal received in the recognition phase.
Speaker recognition may also mean speaker identification, i.e. determining which of a number of enrolled persons is speaking. In such a speaker identification system, there are typically N enrolled speakers (N may be equal to 1, 2, 3, or more) and for each enrolled speaker n (nϵ[1; N]) a voice print is present.
Then, information based on a speaker recognition testing signal is compared against the voice prints of each enrolled speaker to determine which speaker is probably the speaker of the speaker recognition testing signal. For example, it may be determined which voice print has the best match (or which voice prints have the best match(es)) with the information based on the speaker recognition testing signal. Then, an optional speaker verification may be done to confirm or refuse whether the speaker(s) whose voice print(s) had the best match(es) with the information based on the speaker recognition testing signal is (are) the speaker or not.
In speaker identification an open set of speakers may be used, i.e. it is possible that the speaker who is speaking is not included in the N enrolled speakers or a closed-set of speakers may be used if the speaker is always comprised in the enrolled speakers. For example in closed-set speaker identification, the previously mentioned optional speaker verification in speaker identification may not be necessary as the enrolled speaker whose voice print has the best match with the information based on the speaker recognition testing signal is typically the speaker of the speaker recognition testing signal.
Speaker recognition uses biometrical data. Other than a password which may be replaced easily, the biometrical data, once known to the public, cannot be safely reused. Thus, such biometrical data, e.g. a voice print, needs to be protected against unauthorized access, i.e. to maintain the confidentiality of the data.
A speaker verification system using a smart card and its hardware implementation is known from the article “SVM-Based Speaker Verification System for Match-on-Card and Its Hardware Implementation” by Woo-Yong Choi et al. published in the ETRI Journal, Vol. 28, No. 3, June 2006. In this system, the data representing biometrical information is stored inside the smart card and the matching of the information comprised in a testing signal with the biometrical information is done inside the smart card. Thus, the biometrical information is protected from being accessed.
However, this system has several disadvantages, e.g. it can only analyze using a support vector machine, is only usable for speaker verification, is only applicable for text-dependent speaker, i.e. for speaker verification for which the same lexical content has to be spoken during recognition as during enrollment, verification and does not comprise safety measures against attacks.