Speaker verification systems utilize a spoken voice password, or sequence of words forming a phrase, (herein the term pass phrase will be used to include either a single password, a sequence of pass words, or a pass phrase), to determine whether the person uttering the pass phrase is actually the registered person. In known systems, the registered person typically must utter the pass phrase during a registration process during which the registered speaker's identity is verified utilizing a driver's license, passport, or some other acceptable form of identification. The registered person's utterances are then stored as a reference utterance. Typically the reference utterance is stored as an analog waveform or a digital representation of an analog waveform which is received from the microphone circuit (including appropriate amplifiers) into which the registered person uttered the reference utterance.
Later, when a speaker claims to be the registered person, the speaker is prompted to utter the voice pass phrase into a microphone. The analog waveform or digital representation of the analog waveform from the microphone is then compared to the waveform of the reference pass phrase and a comparison algorithm is utilized to calculate a value representing the dissimilarity between the two waveforms. If the dissimilarity is within a predetermined threshold, then the speaker verification system can conclude that the speaker is the registered speaker.
While speaker verification systems are useful for verifying the claimed identity of a person over a telephone, the analog waveform of the uttered pass phrase can be distorted by its transmission over traditional telephone lines to the server performing the verification. Such distortions tend to generate false negative errors (e.g. utterance that should match is determined to be a non-match). While transmission of a digital representation of the analog waveform may eliminate distortions, the bandwidth required for transmission is significantly increased.
Known voice compression algorithms are used to compress spoken audio data for transmission to a remote location over packet switched networks. However, because of distortion caused by compression and decompression, the resulting waveforms again would yield significant false negatives if utilized for speaker verification.
As such, there is a need in the art for a speaker verification system and method for verifying the identity of a remote speaker that does not suffer the disadvantages of known systems.