Large organizations, such as commercial organizations, financial institutions, government agencies or public safety organizations conduct communication sessions, also known as interactions, with individuals such as customers, suppliers and the like on a daily basis.
Communication sessions between parties may involve exchanging sensitive information, for example, financial data, transactions and personal medical data. Thus, in communication sessions with individuals, it may be necessary to authenticate the individual, for example before offering the individual any information or services. When a communication session begins, a system or agent on behalf of one party may first identify the individual. Some organizations use voice prints to authenticate the identity of individuals.
The term “voice print” as used herein is intended to encompass voice biometric data. Voice prints are also known by various other names including but not limited to spectrograms, spectral waterfalls, sonograms, and voicegrams. Voice prints may take many forms and may indicate both physical and behavioral characteristics of an individual. One type of voice print is in the form of time-varying spectral representations of sounds or voices. Voice prints may be in digital form and may be created from any digital audio recordings of voices, for example but not limited to audio recordings of communication sessions between call center agents and customers. A voice print can be generated in many ways known to those skilled in the art including but not limited to applying short-time Fourier transform (STFT) on various (preferably overlapping) audio streams of a particular voice such as an audio recording. For example, each stream may be a segment or fraction of a complete communication session or corresponding recording. A three-dimensional image of the voice print may present measurements of magnitude versus frequency for a specific moment in time.
A speaker's voice may be extremely difficult to forge for biometric comparison purposes, since a myriad of qualities may be measured, ranging from dialect and speaking style to pitch, spectral magnitudes, and format frequencies. The vibration of an individual's vocal chords and the patterns created by the physical components resulting in human speech are as distinctive as fingerprints. Depending on how they are created, voice prints of two individuals may differ from each other at about one hundred (100) different points.
It should be noted that known methods for the generation of voice prints do not depend on what words are spoken by the individual for whom the voice print is being created. They simply require a sample of speech of an individual from which to generate the voice print. The larger the sample, the more information may be included in the voice print. As such those methods may be said to be “text-independent”.
Voice prints may be used to authenticate individuals in any communication session that includes a voice element by at least one party. Such communication sessions are referred to herein as voice communication sessions and include but are not limited to communications between an individual, e.g., human, and apparatus or machinery such as an Automatic Voice Response (AVR) unit or an Integrated Voice Response (IVR) unit, telephone communications, Voice Over IP (VOIP) communications, and video conferences. It should be noted that in voice communications the voice element may be no more than a short speech such as the utterance of a particular phrase, with the remainder of the communication by both parties taking place by other means such as email, instant messaging or any means using a man-machine interface.