This invention relates generally to signal processing, and more particularly, to methods and systems for splitting a digital signal using prosodic features included in the signal.
Users are required to prove who they claim to be during authentication transactions conducted under many different circumstances. For example, users may be required to prove their identity to passport control during an authentication transaction conducted in person at an airport or may be requested to prove their identity to a merchant while attempting to remotely purchase a product from a merchant system over the Internet. Claims of identity may be proven during authentication transactions based on voice biometric data captured from the user.
It is known for users to speak or utter a known phrase during voice biometric authentication transactions, and to process the utterance in accordance with text-dependent speaker verification methods. Text-dependent speaker verification methods use a speech transcription module to identify the phonetics in the utterance, and compare the uttered phonetics against speaker dependent and speaker independent phonetic models. Such text-dependent speaker verification methods allow comparing the uttered phonetics against user recitations of the same known phrase captured from the user during enrollment in an authentication system.
Other known methods of conducting voice biometric authentication transactions don't use direct knowledge of the phonetics content within the known phrase. Instead, such other methods use unsupervised learning techniques to model the known phrase. Specifically, such other systems estimate a speaker independent model of the known phrase using utterances of the known phrase as spoken by different speakers. A speaker dependent model of the known phrase may also be estimated from several utterances of the known phrase uttered by the user. Such speaker dependent models may be seen as a specialization of the speaker independent model given user utterances. Voice biometric data captured during authentication transactions is used to estimate a matching score, S1, against the speaker independent model and a matching score, S2, against the dependent model. When the difference between the matching scores S2 and S1 is greater than a threshold, the identity of the user is proven.
Yet other known methods of conducting voice biometric authentication transactions use phone sequences in speech to account for the dynamic aspects of speech. However, such methods require strong prior knowledge similar to that required for speech transcription systems.