Speaker identification may be requested for a number of different criminal offences, such as making hoax emergency calls to the police, ambulance or fire brigade, making threatening or harassing telephone calls, blackmail or extortion demands, taking part in criminal conspiracies, etc. According to another example, screening of incoming telephone calls is performed in order to alert staff of a call center when a known speaker is on the line or to automatically block that known speaker.
Conventionally, a new speech sample of an unknown speaker of a new incoming telephone call is analyzed in order to determine whether or not the speech sample matches one or more stored samples of already identified speakers. It is determined whether the new speech sample matches one or more known ones to a predetermined degree defined in terms of some distance measure or similarity metrics. For example, Gaussian Mixture Model metrics can be employed to determine whether a Gaussian Mixture Model derived for the new speech sample of the unknown speaker has a distance to Gaussian Mixture Models derived for already identified known speakers below some predetermined threshold. Particularly, the well-known Kullback-Leibler distance can be used.
However, automatic speaker identification presents a very demanding task in terms of response time and computer resources. In some cases, the speed is important because we have to act according the identification as soon as possible. In other cases, the faster the response, more comparisons can be done. Also, in some situations, only limited computer resources are available locally, i.e. at the location where a target speaker's voice is detected. For example, some call center outposts are not equipped with high-performance computers but rather only with resources sufficient for handling calls. According to another example a mobile device, for example, a mobile phone or PDA, may be equipped, for instance, based on some App program, for speaker identification. In this case, only limited memory and CPU power is available. If, however, only limited resources are available, reliable speaker identification could be more difficult.
Thus, it is an object of the present invention to provide a method for the identification of a speaker whose verbal utterance is detected at a location where only limited computational resources are available wherein the method shall allow for fast response times for the identification process.