Large organizations, such as commercial organizations, financial institutions, government agencies or public safety organizations conduct numerous interactions (i.e., communication sessions) with customers, users, suppliers and the like on a daily basis. Many of these interactions are vocal or at least comprise a vocal or audio component, for example, voices of parties to a telephone call or the audio portion of a video or a face-to-face interaction. A significant part of these interactions takes place between a customer and a representative of the organization, e.g., an agent in a contact center.
Communication sessions can involve exchanging sensitive information, for example, financial data, transactions and personal medical data, thus the agent is required to authenticate the identity of the customer, before offering the customer any assistance or services. When a communication session begins the system or an agent first identifies the customer, for example based on the customer's name, telephone number, ID number, Social Security number or Postal Index Number (PIN) code and later authenticates the identity of the customer. Traditional systems and methods use knowledge-based information also known as, Know Your Client (KYC) information, such as personal information known only to the client that was previously stored in the organization database (e.g., the name of your pet, your old school, the marriage data of your parents, etc). Some organizations use secret pass key(s) or even physical characteristics of the person, for example, finger prints and voice prints to authenticate the customer identity.
Voice prints or voice biometric data also known as spectrograms, spectral waterfalls, sonograms, or voicegrams, are time-varying spectral representations of sounds or voices. Digital voice prints can be created from any digital audio recording of voices, for example, audio recordings of communications sessions between agents and customers. A voice print can be generated by applying short-time Fourier transform (STFT) on various (preferably overlapping) audio streams of the audio recording (segment). A three-dimensional image of the voice print can present measurements of magnitude versus frequency for a specific moment in time. A speaker's voice is extremely difficult to forge for biometric comparison purposes, since a myriad of qualities are measured ranging from dialect and speaking style to pitch, spectral magnitudes, and format frequencies. The vibration of a user's vocal chords and the patterns created by the physical components resulting in human speech are as distinctive as fingerprints. Voice prints of two individuals can differ from each other at about one hundred (100) different points.
Enrolling a user's voice prints can require a text-dependent enrollment (e.g., capturing a particular passphrase recited by a user). Some systems can require that a user repeat a particular passphrase multiple times (e.g., 3-5 times). For example, a system can prompt a user as follows: 1) please enter your ID followed by #; 2) please enter your PIN followed by #; 3) to enroll, please say “I like using voice biometrics”; 4) please say again “I like using voice biometrics”; and 5) please say again “I like using voice biometrics”. This can cause low user enrollment because many users do not want to actively enroll.
Some systems can enroll a user's voice print via a text-independent process (e.g., capturing voice utterance(s) of the user that aren't a particular passphrase). For example, a user can say the reason for their call (e.g., “I would like to set up an account”). The system can capture the user's utterance and use that utterance for the enrollment of the user's voice print. While text-independent enrollment of voice prints can be less burdensome for the user, identifying the user with the voice print created with text-independent enrollment can be less accurate than text-dependent enrollment.
Therefore, it can be desirable to maintain accuracy of text-dependent enrollment and reduced complexity of the text-independent enrollment.