The present invention relates to audio conferencing and, more particularly, but not exclusively to a method and apparatus for identifying participants of an audio conference call.
Audio conferencing is a concept well known in the art. More specifically, audio call conferencing is very frequently implemented in both fixed and cellular telephony networks. Typically, audio call conferencing allows more than two parties, or communication terminals, to be involved in the same communications session. For example, when two parties are involved in a communications session, it may be desirable to invite one or more parties to the same session.
This may be achieved by dialing a special code number in communication with a call conferencing service activation feature, followed by the identification number of the party to be invited. A telephone switch then connects that party to the ongoing voice call between the first and the second party.
With the fast evolution of telephone networks, various communications protocols have defined new and more flexible manners of handling voice and data call sessions in telecommunications networks.
Reference is now made to FIG. 1, which is a simplified block diagram illustrating a system for audio conference management in a prior art telephony network.
Audio call conferencing service is often provided using a supplementary audio conference management system 1000, which resides within a telephony network.
An exemplary supplementary audio conference management system 1000 includes two sub systems:                (a) An Audio Conference Application 108, which mainly handles the provisioning of the audio conferencing services—such as participants information and preferences, Scheduling (On Line, Pre Called), Resource Management, User permissions, Session Database, GUI etc, as known in the art.        (b) An Audio Conference Mixer 109 which handles all telephony control aspects, such as termination of all conference lines, termination of all Call Signaling Information and to mix audio signals received from the Rx (Received) Channels (i.e. the data channel or phone line used by each of the participant of the call) back to the Tx (Transmitted) Channels (which may be the same channels used by the participant of the call), as known in the art. The Mixer may be implemented as a bridge, MRF (Media Resource Function) or MCU (Multipoint Control Unit), as known in the art.        
That is to say, audio data packets (or audio signals) received from the participants of the conference call through data channels 105, where each participant uses a dedicated data channel are mixed together. Then, a resultant mixed signal is sent back to all participants, through the data channels 105.
One of the inconveniences during audio conference calls is that conference participants sometimes cannot identify the current speaker. This usually happens because of voice distortions, bad communication lines, background noises, or just because the participants of the audio conference are not familiar with each other.
In Telephony Networks, the fact that only one current speaker is talking does not prevent the voices of remaining participants of the call conference from being mixed together and heard in the conference.
More specifically, in current Voice over Internet Protocol (VoIP) Telephony Networks, there is no clear indication of a current speaker since VoIP is a Packet Switch (PS) based Technology (unlike legacy PLMN which is a Circuit Switch (CS) based technology, which allows clear identification of the speaker's line).
According to Request for Comments (RFC) No. 3550, the audio conference mixer marks all Media Contributing sources indicated by CSRCs to the Conference Audio RTP (Real Time Transport Protocol) mixed stream, hence making it more difficult to pin point the current speaker.
A CSRC (Contributing Source) is an indicator of a contributing stream of RTP data packets. The CSRC are indicated within the combined stream produced by an RTP mixer. Each CSRC is related to a specific one of the participants in the conference call.
Usually, there is a problem of background noise injected from all participants of the conference call. Consequently, the audio conference mixer combines all Contributing Sources (indicated by CSRCs) into the single RTP packet stream, and sends the combined steam to all participators of the conference call.
The background noise problem is further aggravated by the comfort noise injected intentionally by VoIP (Voice over Internet Protocol) Handsets, as described by RFC 3389, entitled “Real Time Transport Protocol (RTP) Payload for Common Noise (CN).
Existing solutions do not provide any means to identify the current speaker, in real time. However, some current solutions may declare participant's name when the participant joins or leaves the conference, which is independent of whether a participant is speaking. Some current solutions are based on prerecording of the participants' names at the beginning of a voice call conference.
For example, the Session Initiation Protocol (SIP) is an Internet Engineering Task Force (IETF) standard protocol for initiating an interactive user session. The interactive user session may involve multimedia elements such as video, voice, chat, gaming, and virtual reality. Like the Hyper Text Terminal Protocol (HTTP), or the Simple Mail Transfer Protocol (SMTP), SIP works in the Application layer of the Open Systems Interconnection (OSI) communications model.
SIP can establish multimedia sessions or Internet telephony calls, and modify or terminate them. Because SIP supports name mapping and redirection services, SIP makes it possible for users to initiate and receive communications and services from any location, and for networks to identify the users wherever they are. SIP is a request-response protocol, dealing with requests from clients and responses from servers.
Participants are usually identified by SIP Uniform Resource Locators (URLs) or Uniform Resource Identifiers (URIs), although SIP also supports E.164 telephone number addressing. Requests can be sent through any transport protocol, such as the User Datagram Protocol (UDP), the Simple Control Transport Protocol (SCTP), or the Transfer Control Protocol (TCP).
SIP determines the end system to be used for the session, the communication media and media parameters, and the called party's desire to engage in the communication. Once these are assured, SIP establishes call parameters at either ends of the communication, and handles call transfer and termination.
There are a few SIP Requests for Comments (RFCs) that mention services for identifying a current speaker during a voice conference call session:                RFC 3550 paragraph 3 [2] describes: “Contributing Source (CSRC) . . . An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer)”.        RFC 4575 paragraph 5.8.4, defines the ability to query the conference application by using SIP event package for conference state. SIP event package for conference state utilizes SIP SUBSCRIBE/NOTIFY mechanism to inform members about the current speaker.        
RFC 4575 also adds that: “If an RTP mixer compliant to the above is used, participants can perform an SSRC to user mapping and identify a current speaker”.
None of the RFCs cited above define a method of determining a current speaker ID.
There is thus a widely recognized need for, and it would be highly advantageous to have, a system devoid of the above limitations.