Different speakers can be recognized based on the utterance characteristics of each speaker when he/she is speaking, thus speaker authentication can be conducted. Such three common speaker recognition engine techniques as HMM, DTW (Dynamic Time Warping) and VQ are introduced in K. Yu, J. Mason, J. Oglesby, “Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation” (Vision, Image and Signal Processing, IEE Proceedings, Vol. 142, October 1995, pp. 313-18).
Usually, the process of speaker authentication includes an enrollment phase and a verification phase. In the enrollment phase, a speaker template of a speaker (user) is generated according to the utterance containing a password spoken by the speaker himself; and the in the verification phase, it is determined whether a test utterance is one containing the same password spoken by the speaker himself based on the speaker template. Thus, the quality of the speaker template is very important to the whole authentication process.
For a DTW-based speaker verification system, a number of features are required as input of each frame for reliable performance. In general, these features are extracted from all the speakers in the same way and the specialty of each speaker is neglected. Some schemes have been proposed to customize optimal feature sets for each speaker by choosing proper feature subset from the acoustic feature set. By this method, verification performance can be improved as well as the memory requirement for template is reduced. However, the effective criterion for feature selection is a puzzle, especially when available information is limited.
A known optimization method can be specified in terms of two components: a performance criterion and a search procedure. For the first component, usual performance criterion demands an impostor database e.g. False Accept Rate is used as the performance criterion in B. Sabac (2002): “Speaker recognition using discriminative features selection” in ICSLP-2002, pp. 2321-2324. That is to say, we need test the performance of different feature subsets with a great number of client trials and impostor trials so as to find the optimal one. However, impostor data are seldom available in a password-selectable speaker verification system.