Communication over a telephonic network typically involves different handset types. Exemplary handset types may include land handsets, cellular handsets, headsets, internet telephony microphones, and still other user communications devices connectable to the network. Differences in various handsets may significantly affect the quality of voice transmitted over a network using that handset. For example, cellular phones are often more optimized for use in outdoor (or otherwise nosier) environments compared to the indoor (or otherwise more silent) environment of land phones. Thus, a cellular phone may be designed to reject weaker or background noises, which can cause a cell phone to perform poorly with speakers who do not speak directly into the mouthpiece. At the same time, cellular phone mouthpieces may fall short of many users' mouths, due to the desire to have smaller phone size that fits readily in pockets or purses. Or, cellular phones may have small microphones that are prone to being inconsistently located in front of the mouth during use, resulting in more noise in the transmitted voice. These and other factors result in performance variations among different handsets, and associated difficulties in speech processing processes. This is particularly significant in speaker verification or other processes involving identification of an individual user (i.e., identifying not only the words spoken, but also the speaker's characteristic vocal patterns). One technique for reducing the error in speech processing caused by variations in handsets is to identify (or classify) the handset type being used to transmit voice. For example, once a handset is identified, a handset-specific model may be used in speaker verification processes to more accurately identify a given speaker.
An existing handset identifier uses a “maximum likelihood” (ML) classification. ML classification typically separates multiple classes of handsets based on parametric models (e.g. Gaussian probabilistic models, see “Speaker Verification Using Adapted Gaussian Mixture Models,” D. Reynolds, et al., Digital Signal Processing 10, pgs. 19–41 (2000)). One disadvantage of the Gaussian probabilistic models is that these models assume normal distributions. Most data to be processed do not have a normal distribution, thus, these models typically do not represent training data distribution well. ML classification may also use non-parametric models (e.g. histogramming), where the accuracy of handset identification is limited by the number and size of bins used to construct the histogram models (see “Pattern Classification and Scene Analysis,” R. Duda and P. Hart, Wiley, 1993). Further, ML classification assumes that the usage of different handset types is of equal probability, which is generally not an accurate assumption. For example, ML classification assumes that a user having 3 types of handsets (e.g., land phone, cell phone, and headset) has a ⅓ likelihood of using each type of handset.
Another handset identifier uses a “maximum a posteriori” (MAP) classification. Like ML classification, MAP classification also employs both parametric and non-parametric models. Thus, MAP classification has the same disadvantages described above for ML classification. However, MAP is able to account for the differences in handset usage probability, and is thus superior in that regard.
Another family of classifiers, used outside the handset identification space, is known as “support vector machines” (SVMs). For example, SVMs are often used in pattern recognition (e.g., see “The Nature of Statistical Learning,” V. Vapnik, Springer Verlag, 1995, “Support Vector Networks,” C. Cortes and V. Vapnik, Machine Learning, 20: 1–25, 1995, and “A Tutorial on Support Vector Machines for Pattern Recognition,” Christopher J. C. Burges, Bell Laboratories, Lucent Technologies). SVMs generally do not rely on probabilistic models or estimations of probabilities. Instead, SVMs perform binary pattern classification by determining an optimal decision surface (i.e., a hyperplane) in a domain that separates the training data into two classes (e.g., a positive class and a negative class). Once trained, the SVM can classify inputted data (“test data”) received via an appropriate interface1 as belonging to either the positive or negative class by determining which side of the decision surface the test data fall on. SVMs have not been applied to identify/classify handsets because SVMs are: (i) a relatively new technology; (ii) more complex compared to existing handset classification techniques; and (iii) generally regarded as being limited to binary classification (whereas handset classification requires n-ary classification). 1 For example, depending on the environment, the interface could be a PSTN interface (e.g., from Dialogic corporation), a radio cell, etc.