Large organizations, such as commercial organizations, financial institutions, government agencies or public safety organizations conduct numerous interactions (i.e., communication sessions) with customers, users, suppliers and the like on a daily basis. Many of these interactions are vocal or at least comprise a vocal or audio component, for example, voices of parties to a telephone call or the audio portion of a video or a face-to-face interaction. A significant part of these interactions takes place between a customer and a representative of the organization, e.g., an agent in a contact center.
Communication sessions can involve exchanging sensitive information, for example, financial data, transactions and personal medical data, thus the agent is required to authenticate the identity of the customer, before offering the customer any assistance or services. When a communication session begins the system or an agent first identifies the customer, for example based on the customer's name, telephone number, ID number, Social Security number or Postal Index Number (PIN) code and later authenticates the identity of the customer. Traditional systems and methods use knowledge-based information also known as, Know Your Client (KYC) information, such as personal information known only to the client that was previously stored in the organization database (e.g., the name of your pet, your old school, the marriage data of your parents, etc.). Some organizations use secret pass key(s) or even physical characteristics of the person, for example, fingerprints and voice prints to authenticate the customer identity.
Voice prints or voice biometric data also known as spectrograms, spectral waterfalls, sonograms, or voicegrams, are time-varying spectral representations of sounds or voices. Digital voice prints can be created from any digital audio recording of voices, for example, audio recordings of communications sessions between agents and customers. A voice print can be generated by applying short-time Fourier transform (STFT) on various (preferably overlapping) audio streams of the audio recording (segment). A three-dimensional image of the voice print can present measurements of magnitude versus frequency for a specific moment in time. A speaker's voice is extremely difficult to forge for biometric comparison purposes, since a myriad of qualities are measured ranging from dialect and speaking style to pitch, spectral magnitudes, and format frequencies. The vibration of a user's vocal chords and the patterns created by the physical components resulting in human speech are as distinctive as fingerprints. Voice prints of two individuals can differ from each other at about one hundred (100) different points.
Voice prints can be used to authenticate a user (e.g., customer). In some systems a passphrase is used to authenticate the user. For example, when a user in enrolled in a system, the system can prompt the user to input a passphrase (e.g., answer a specific question or repeat a particular phrase, for example, “my voice is my password”). A text-dependent voice print of an audio response of the user can be created, such that after enrollment, upon subsequent authentication, the user is prompted with the passphrase for authentication. One difficulty with current approach is that a fraudster knowing the passphrase can obtain a recording of the user repeating the passphrase, and play the recording to authenticate and obtain access to the user's request.
Therefore, it can be desirable to prevent a fraudster from stealing a user's passphrase.