The present invention is generally directed to speaker verification, and more particularly, to a method of accommodating variability among different types of telephone handsets, in order to improve the accuracy of speaker verification.
Speaker Verification (SV) is a speaker-dependent pattern-matching process in which a subscriber's speech sample presented for verification is processed to produce a verification pattern. This verification pattern is compared to an SV reference pattern that is typically produced from speech samples previously provided in the course of a so-called registration session. A "match" between the verification and reference patterns occurs when their characteristics are substantially similar. Otherwise, a "mismatch" is said to have occurred.
A typical application of SV is a telephony-based security system. A subscriber "registers" with the system by providing speech samples over a telephone link and an SV reference pattern is produced. Subsequently, a caller, seeking access to, for example, a service or some secure data, calls the system and presents his/her speech sample for verification as described above. If a match occurs, the desired access is granted. If there is a mismatch, it is presumed that a so-called imposter--pretending to be a subscriber--was the caller and access is denied.
Many times, SV is complicated by the fact that the verification pattern is different from the SV reference pattern due to circumstances such as, illustratively, the use of different types of telephone handset microphones, e.g., linear (such as electret) and non-linear (such as carbon). Other examples include different background noises and different speaking levels. These differences can cause characteristics of the speech sample provided during registration and the speech sample provided during any particular SV verification session to be different from one another. The corresponding patterns will then also be different, possibly resulting in an incorrect "mismatch" determination.
In particular, an electret microphone performs a fairly linear transformation on incoming speech samples and, as such, minimally distorts them. A carbon microphone, on the other hand, performs a non-linear transformation on the speech samples by, for example, compressing high-volume speech levels and suppressing low background noise levels, the latter often being referred to in the art as "enhancement." As such, the carbon microphone distorts the speech samples to a significant extent. Because of the variability in the effects that these different types of microphones have on the samples, it is difficult to discriminate between a mismatch caused by using different types of microphones and a mismatch caused by comparing an SV reference pattern to a verification pattern generated from a speech sample provided by an imposter.
Thus, a subscriber who registers using one type of telephone handset microphone and attempts to be "verified" using another type of handset microphone is more likely to be denied access than one who registers and attempts to be verified using the same type of handset microphone.