Verification (also known as authentication) is a process of verifying the user is who they claim to be. A goal of verification is to determine if the user is the authentic enrolled user or an impostor. Generally, verification includes four stages: capturing input; filtering unwanted input such as noise; transforming the input to extract a set of feature vectors; generating a statistical representation of the feature vector; and performing a comparison against information previously gathered during an enrollment procedure.
Speaker verification systems (also known as voice verification systems) attempt to match a voice of a speaker whose identity is undergoing verification with a known voice. Speaker verification systems help to provide a means for ensuring secure access by using speech utterances. Verbal submission of a word or phrase or simply a sample of an individual speaker's speaking of a randomly selected word or phrase are provided by a claimant when seeking access to pass through a speaker recognition and/or speaker verification system. An authentic claimant is one whose utterance matches known characteristics associated with the claimed identity.
To train a speaker verification system, a claimant typically provides a speech sample or speech utterance that is scored against a model corresponding to the claimant's claimed identity and a claimant score is then computed to confirm that the claimant is in fact the claimed identity.
Conventional speaker verification systems typically suffer in terms of relatively large memory requirements, an undesirable high complexity, and an unreliability associated with each of the first conventional method and the second conventional method to perform speaker verification. For example, in many speaker verification systems, Hidden Markov Models (HMM) are used to model speaker's voice characteristics. Using Hidden Markov Models, however, may be very expensive in terms of computation resources and memory usage making Hidden Markov Models less suitable for use in resource constrained or limited systems.
Speaker verification systems implementing vector quantization (VQ) schemes, on the other hand, may have low computation and memory usage requirement. Unfortunately, vector quantization schemes often suffer from a drawback of not taking into account the variation of a speaker's voice over time because typical vector quantization schemes represent a “static-snapshot” of a person's voice over the period of an utterance.
Another challenge posed under real-life operating environments is that noise and background sounds may be detected and considered as part of the utterance and thereby become a source of performance degradation in speaker verification systems. Noise may occur as a result of differences in transducers, acoustic environment, and communication channel characteristics used in a given verification system.