As a general matter, it is known to establish a real-time media conference over a packet-switched network between multiple user stations, each operated by a respective user. A communication server, such as a multipoint conference unit (MCU) for instance, can reside functionally in the network and can operate as a bridging or switching device between the participating stations, to support the conference session.
In practice, a participating station might initiate the conference session by sending to the communication server a session setup message that identifies the other desired participant(s). The server may then seek to connect each of the designated other participants, such as by forwarding the session setup message or sending a new session setup message to each other party. Ultimately, the server would thereby establish a conference leg with each participating station, including the initiating station, and the server would then bridge together the legs so that the users at the stations can confer with each other, exchanging voice, video and/or other media in real-time via the server.
A signaling mechanism such as the well known Session Initiation Protocol (SIP) could be used to initialize the conference and more particularly to set up each conference leg. Further, digitized media could be packetized and carried between each participating station according to a mechanism such as the well known Real-time Transport Protocol (RTP), for instance. The core industry standards for SIP (Internet Engineering Task Force (IETF) Request For Comments (RFC) 3261) and RTP (IETF RFC 1889) are hereby incorporated by reference.
According to RTP and its associated Real Time Control Protocol (RTCP), each packet in an RTP media stream can include an RTP header having certain defined fields, including (i) a sequence number, which indicates a position of the packet in the stream, (ii) a timestamp, which indicates the instant when the data in the packet was established (sampled), (iii) a payload type, which indicates the format of the media, to enable a receiving end to play out the media, (iv) a “synchronization source (SSRC) identifier,” which is a randomly generated code that distinguishes the source from others in the session, and (v) optionally one or more “contributing source (CSRC) identifiers” indicating the SSRCs of each stream that formed the basis for the RTP stream. Furthermore, according to SIP, a server may engage in session setup and control signaling with various parties by reference to their SIP identifiers or “SIP addresses”.
Packet based media conferencing can be advantageously employed to provide an “instant connect” service, where a user of one station can readily initiate a real-time media conference with one or more designated target users at other stations. The initiating user may simply select a target user or group and then press an instant connect button on his or her station, and the user's station would responsively signal to a communication server to initiate a conference between the initiating user and the selected user or group. This sort of service is referred to as “instant connect” because it strives to provide a quick connection between two or more users, in contrast to telephone service where a user dials a telephone number of a party and waits for a circuit connection to be established with that party.
An example of an instant connect service is commonly known as “push-to-talk” (PTT). In a PTT system, some or all of the conference stations are likely to be wireless devices such as cellular mobile stations, that are equipped to establish wireless packet-data connectivity and to engage in voice-over-packet (VoP) communication. Alternatively, some or all of the stations could be other sorts of devices, such as multimedia personal computers or Ethernet-telephones, that can establish packet data connectivity and engage in VoP communication through landline connections. Further, each station could be equipped with a PTT button or other mechanism that a user can engage in order to initiate an PTT session or to request the floor during an ongoing session.
In addition, the same basic functionality can be applied with respect to other media types beyond voice, such as video or multi-media for instance, and may be generally characterized as push-to-x (PTX). Thus, another example of such functionality would be push-to-view (PTV), involving video conferencing.