Field of the Invention
The present invention relates to the art of automatic speaker recognition and, in particular, speaker identification from incoming telephone calls.
Speaker recognition plays an important role in the context of intelligence investigations during which a huge number of telephone calls is to be analyzed with respect to the speaker's identity. For example, a particular at least one target speaker is tracked based on a set of speech samples obtained for the at least one target speaker during telephone calls in the past. According to another example, screening of incoming telephone calls is performed in order to alert staff when a known speaker was on the line. Speaker Identification may be requested for a number of different criminal offences, such as making hoax emergency calls to the police, ambulance or fire brigade, making threatening or harassing telephone calls, blackmail or extortion demands, taking part in criminal conspiracies, etc.
Conventionally, a new speech sample of an unknown speaker of a new incoming telephone call is analyzed in order to determine whether or not the speech sample matches other samples of already identified speakers. It is determined whether the new speech sample matches on or more known ones to a predetermined degree defined in terms of some distance measure or similarity metrics.
For example, Gaussian Mixture Model metrics can be employed to determine whether a Gaussian Mixture Model derived for the new speech sample of the unknown speaker has a distance to Gaussian Mixture Models derived for already identified known speakers below some predetermined threshold. Particularly, the well-known Kullback-Leibler distance can be used.
However, automatic speaker identification still is a demanding task, since the reliability of the methods for speaker recognition in telephone calls is not considered sufficient and still prove error-prone with respect to the confusion of unknown speakers with know ones.
Thus, it is an object of the present invention to provide a method for speaker recognition in telephone calls with improved accuracy as compared to the art.