1. Field of the Invention
The present invention relates to speech processing systems in general and in particular to a method for automatic speaker spotting in order to search, to locate, to detect, and to a recognize a speech sample of a target speaker from among a collection of speech-based interactions carrying multiple speech samples of multiple interaction participants.
2. Discussion of the Related Art
Speaker spotting is an important task in speaker recognition and location applications. In a speaker spotting application a collection of multi-speaker phone calls is searched for the speech sample of a specific target speaker. Speaker spotting is useful in a number of environments, such as, for example, in a call-monitoring center where a large number of phone calls are captured and collected for each specific telephone line. Speaker spotting is further useful in a speech-based-interaction intensive environment, such as a financial institution, a government office, a software support center, and the like, where follow up is required for reasons of dispute resolution, agent performance monitoring, compliance regulations, and the like. However, it is typical to such environments that a target speaker for each specific interaction channel, such as a phone line participates in only a limited number of interactions, while other interactions carry the voices of other speakers. Thus, currently, in order to locate those interactions in which a specific target speaker participates and therefore those interactions that carry the speech sample thereof, a human listener, such as a supervisor, an auditor or security personnel who is tasked with the location of and the examination of the content of the speech of a target speaker, is usually obliged to listen to the entire set of recorded interactions.
There is a need for a speaker spotting method with the capabilities of scanning a large collection of speech-based interactions, such as phone calls and of the matching of the speech samples of a speaker carried by the interaction media to reference speech samples of the speaker, in order to locate the interaction carrying the speech of the target speaker and thereby to provide a human listener with the option of reducing the number of interaction he/she is obliged to listen to.