At the most fundamental level, a speech signal contains two important pieces of information: information about the content of the speech and information about the speaker. Information about the speaker can be used in speaker identification. There are two types of speaker identification. In the first, the speaker does not claim to be a particular person, but the computer identifies that person given their speech characteristics. This is known simply as speaker identification. The person that is being identified may or may not be aware of the identification. In the second type of speaker identification, the speaker identifies himself in some manner and the computer must verify that identity through the speech characteristics of the speaker. This is defined as speaker verification. Speaker verification is commonly associated with security related access, and the person is usually aware of the verification process.
In speaker verification, error rates are dependent on the selection of a decision threshold affected by the similarity of feature parameters among speakers. Like other speech applications, a speaker verification system accumulates errors through algorithms, processing, approximations, noisy data, etc. Speaker verification makes a binary decision after comparing data collected from a speaker to a training set of data previously collected from the speaker. Each speaker has a training set, a group of feature vector templates, which are recalled when an identity claim is made. The feature vectors are parameters extracted from the speech data. The templates are used to compare to current feature vectors extracted from a test utterance. The verification system must make the decision to accept or reject the identity claim based on a comparison between the test feature vector and template feature vectors.
Prior speaker verification systems relied exclusively on acoustic data collected from a speaker. A microphone captured the speaker's voice and algorithms converted the acoustic data to acoustic feature vectors, or acoustic parameters. One serious problem with an all-acoustic speaker verification system is that it is very susceptible to noise. Errors in verification rise dramatically in the presence of noise either during test feature vector creation, or during verification when the speaker repeats a previously recorded test sentence.
In order to reduce reliance on exclusively acoustic data, equipment has been developed to collect non-acoustic data for use in speaker verification. Low power electromagnetic radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantity of information for many speech related applications. For example, see Holzrichter, Burnett, Ng, and Lea, J. Acoustic. Soc. Am. 103 (1) 622 (1998). Electromagnetic micropower speech sensors were developed to characterize the real-time physical movements of a speaker's vocal articulation during speech. For example, see Burnett, G. B., University of California, Davis, “The physiological basis of Glottal Electromagnetic Micropower Sensors (GEMS) and their use in defining an excitation function for the human vocal tract.” Ph.D. Dissertation, 1999. Some work has also been done to improve the extraction of traditional speech parameters, such as pitch, by using EM data, for example, Burnett, G. B. Gable, T. J. Ng L. C. and Holzrichter, J. F. “Accurate and noise-robust pitch extraction using low power electromagnetic sensors”. 1998.