To reduce costs or inconvenience associated with travel, people sometimes conference “virtually” by using telephones or other electronic audio/video devices. These electronically facilitated conferences are generally referred to as teleconferences or videoconferences. Participants in teleconferences employ teleconference devices to speak or to hear other participants. Participants in videoconferences employ videoconference devices to (1) speak and be seen and (2) hear and see other participants. People may also employ computing devices, such as personal computers and handheld computers, as well as other devices, such as mobile or conventional telephones, to participate in electronically facilitated conferences. Participants may also use other supporting equipment, such as imaging devices to share documents or projection devices to project an image of participants, such as to enable multimedia conferencing. Electronically facilitated conferencing can be combined with conventional conferencing, such as to enable attendees who are physically present in a conference room to employ electronic equipment to exchange information and communicate with remote attendees who are participating virtually. An electronically facilitated conference can include participants who use different types of equipment, such as computers, mobile phones, conventional telephones, teleconferencing equipment, and videoconferencing equipment.
When an electronically facilitated conference supports different types of equipment, it may require the use of a common protocol so that participants using one device type can communicate with participants using a different device type. As an example, the Session Initiation Protocol (SIP) has become a popular protocol for use in electronically facilitated conferencing. SIP can be used to create, modify, and terminate “sessions” with one or more participants. These sessions can support teleconferencing, videoconferencing, and multimedia conferencing. Another popular protocol for electronically facilitated conferencing is H.323, which is similar to SIP in many respects.
Devices employing SIP can establish sessions with each other by employing an Internet Protocol (IP) network. This network of SIP devices can be called a SIP network. A SIP network comprises entities (e.g., devices or applications that employ SIP) that can participate in a SIP session as a client, a server, or both. SIP supports multiple types of entities, including user agents and routing agents. User agents initiate and terminate sessions by exchanging messages with other SIP entities. A user agent can be a user agent client (“UAC”), which is a device that initiates SIP requests, or a user agent server (“UAS”), which is a device that receives SIP requests and responds to such requests. As examples, IP telephones, personal digital assistants, personal computers, and any other type of computing device can be user agents. A device can be a UAC in one SIP session and a UAS in another, or may change roles during the session. A routing agent, such as a gateway, can connect entities across networks, such as an IP network and a public switched telephone network (PSTN).
SIP supports multiple message types, including requests, which are sent from a UAC to a UAS, and responses, which are sent from a UAS to a UAC when responding to a request. A SIP message can comprise three parts. The first part of a SIP message is a “request line,” which includes fields to indicate a message (e.g., INVITE), an identification of the entity or user sending the message, such as a Uniform Resource Identifier (URI), and a request URI that identifies the entity or user to which the request is being directed. The second part of a SIP message comprises headers whose values are represented as name-value pairs. The third part of a SIP message is the message's body, which is used to describe the session to be initiated or which contains data that relates to the session. Message bodies may appear in requests, responses, or other SIP messages.
A protocol that can be used for teleconferencing is Voice over Internet Protocol (VoIP). VoIP can function with both SIP and H.323 to enable participants to exchange VoIP messages that carry, for example, a digitization of their voices in a manner similar to conventional telephones. Applications executing on computing devices can employ VoIP, as can devices that are designed to employ VoIP. Examples of VoIP devices include VoIP telephones and VoIP teleconferencing equipment. These devices may contain hardware and software (e.g., embedded in integrated circuits) that enable the devices to connect directly to an IP network instead of, or in addition to, a PSTN that conventional telephones employ. Participants in VoIP conversations can employ various types of devices, including VoIP devices, conventional devices, and so forth, to participate in an electronically facilitated conference by calling other participants or responding to calls from other participants. In the following discussion, a participant that initiates a call is termed a “caller” whereas a participant that receives the call is termed a “callee.”
When many people participate in a conference, it can be difficult to determine who the participants are and who is presently speaking. Even if speakers identify themselves, other listeners who have not previously heard a speaker's voice may be unable to ascertain whether speakers are really who they claim to be. Moreover, when some participants' devices transmit distracting background noises, such as barking dogs, crying children, or noisy traffic, it can be difficult for other participants to hear the speaker clearly.