1. Field of the Invention
The present invention relates to voice analysis in general, and to a method and apparatus for identifying an unknown speaker, in particular.
2. Discussion of the Related Art
Traditional lawful interception relies mainly on intercepting phone calls of known targets, for which warrants had been issued. Modern lawful interception comprises intercepting interactions made by additional communication means used by the known targets, including computerized sources such as e-mails, chats, web browsing, VOIP communications and others. The process of monitoring a target includes analysis of the captured information and related meta data using a variety of technologies, displaying different data sources on the same platform, and managing the entire workflow of one or more investigators. In the common scenario, one of the parties of the intercepted phone call or another vocal communication, such as the audio part of a video conference, is usually known to the investigators, while the other party is not necessarily known. It is also possible that multiple parties are unknown, for example in a conference call, or when speakers in any of the parties change during the communication exchange, or when another person is using the communication device associated with a person under surveillance. However, there is a possibility that the other, i.e., the unknown party is involved in other cases investigated by that or another law enforcement agency or is otherwise known to such agency. In such cases it would be desirable to identify the unknown speaker or speakers, so that additional relevant information can be associated and processed with the interaction or with other information related to the target, i.e. the person whose interactions are being intercepted.
Unlike speaker verification problems, in which it is required to verify whether a given voice matches a specific stored voice representation, voice print or voice sample, in speaker identification problems it is required to identify the speaker from a collection typically having between tens and hundreds of thousands of voices. An alternative scenario is in a call center, a trading floor or another organizational unit participating in vocal interactions. In such calls, one side of the call, being the agent or another representative of the organization is known, while the other side is a-priori unknown. When the unknown speaker identifies, it is possible to verify his or her identity. However, if the verification fails, it is desirable to know the real identity, or at least receive additional information related to the speaker. Identifying the caller may assist in preventing fraud actions and other crimes.
Speaker identification is optionally performed by generating for each known or available speaker, a representation of the speaker, being or including a mathematical entity such as a statistical model, that represents the characteristics of the speaker's voice, and storing the representation. The characteristics may include acoustic as well as non-acoustic characteristics. Yet it is possible also to store features, such as samples of the stored voice or some features extracted from the voice as part of the model associated with the speaker. As an example, such presentation can be a statistical model such as Gaussian Mixture Model (GMM), adaptive GMM (AGMM), a vector of features or the like. Then, when a voice sample to be identified is given, it is tested against the stored representations and if the caller is identified with one or more representation, he or she is assigned to be the speaker, or one of a list of speakers, whose representation best matches the characteristics of the unknown caller. The voice sample itself is preferably represented as a parameterized representation of the voice. Otherwise, the caller is determined as an unknown speaker.
The process introduces a number of problems. First, the time required for such a process is generally proportional to the size of the voice collection, and can therefore be too long for providing effective results, especially when a large volume of calls is to be analyzed continuously, or when the analysis result is required urgently or in real time. Moreover, the identification performance degrades and its statistical significance decreases as the number of voices in the collection grows. Yet another problem is that the speakers voice is not guaranteed to be in the collection, in which case it is preferable to not associate the voice at all than associate it with the wrong speaker.
There is thus a need in the art for speaker identification method and apparatus, which will enable the identification of a speaker from a multiplicity of known speakers, in an environment of an organization such as law enforcement to institutes, security departments of call centers, or financial institute, or any other organization. The method and apparatus should be efficient so as to enable the identification of a speaker in real-time or near-real-time, in order to provide organizations or other users with the ability to react efficiently. The method and apparatus should also provide high performance, i.e. low error rate.