There exist many circumstances requiring verbal communication between a speaker and a listener of a plurality of participants, where the separation of the speaker and the listener is such that it can be hard for the listener(s) to determine who spoke.
The difficulty can arise when the speaker and the listener(s) are located out of direct line of sight. One such example is the use of an audio conference call held over a telecommunications network between multiple participants located at different geographical locations. This is well-known as a means of conducting business communications. However, when the number of participants in a call is more than two, it can be difficult for the participants to work out who of them is speaking at any given time. This problem is a consequence of the participants not being in direct line of sight with each other and therefore having to rely solely upon an audio signal to identify who is speaking on the other end of the call. The problem is exacerbated when conducting a conference call over a conventional plain old telephone service (POTS) network, because the useable voice frequency band over a POTS network is limited to approximately 300 Hz to 3,400 Hz, i.e. a small proportion of the frequency band (around 20 Hz to 20,000 Hz) representative of the range of human hearing. Therefore, in addition to the listening participants having to rely solely upon their auditory sense to identify who is speaking, those same participants have to base the identification on an audio signal which is compressed. Speaker identification can be further hampered by any distortion in the speech of the speaking participant which may be introduced by transmission over a POTS network.
The same difficulty of the listener(s) identifying who is speaking can also arise in a conference or lecture having a plurality of participants located in a single room (such as a crowded lecture theatre). Where there are a large number of participants in a single room, it can be hard for those listening to determine who is speaking amongst the participants, even if the speaker is in direct line of sight with those listening.
The use of voice recognition systems which are able to identify who is speaking based upon recognising a given person's voice from their voice signature is known. However, such systems would require training to establish a voice profile sufficient to identify a given person, as well as a database containing the voice profiles of all persons on a given call. Such a system would therefore be costly in terms of both time and infrastructure.
Consequently, there is a need for an improved means of identifying who is speaking in a verbal communication scenario between a listener and a speaker where the listener cannot easily see who is speaking