In a conference call several users will dial into a conferencing bridge system (even if this is masked and they do not think that they are). This bridge will merge the incoming speech data from all lines into a single downlink and push this back to the remaining users on the call. Thus, conventionally speech data overlaps and cannot be separated once at the receiving terminal. Further, the speech data within each uplink will be bandwidth limited and compressed (using standard processes as would be performed with any GSM/3G call).
Conference calling in such telecommunication systems allows multiple callers in different locations to interact in a single voice call. A problem with such conference calls is differentiating between participants using a single mono audio output of the type that is normal with mobile handsets. This is exacerbated by network bandwidth limitations in such conference calls as well as signal compression that results in many of the subtle clues used by a person to recognise another person's voice being lost. In a conference call human voice is bandwidth limited to 4 kHz and all incoming lines are overlapped to create a single downlink to a single speaker. It is therefore difficult to differentiate between users, or decide when a new user begins to speak, as well as knowing how many people are involved in the call and remembering their names.
Adding spatial information to each user within a conference call could help users differentiate between users on the call. However, because uplinked lines are merged into one downlink, and compressed and bandwidth limited, this spatial information cannot conventionally be transferred across the mobile network.
It has been suggested in US2003/0044002 A1 to create spatially resolved audio signals for a listener as representative of one or more callers. In this way, audio signals appear to emanate from different spatial locations around the listener. A tag identifying a caller is transmitted in the data signal together with the speech data. The tag is separate from the speech data; the tag is present either before or after the speech data in the data signal.
GB 2416955 discloses an arrangement to reduce bandwidth required to relay signals simultaneously from all participants in the conference call.