As defined by IETF RFC 3261, Session Initiation Protocol (SIP) is an application-layer control (signalling) protocol for creating, modifying, and terminating sessions with one or more participants, in an IP network. These sessions include Internet telephone calls, multimedia distribution, and multimedia conferences. SIP invitations used to create sessions carry session descriptions that allow participants to agree on a set of compatible media types. SIP makes use of elements called proxy servers to help route requests to the user's current location, authenticate and authorise users for services, implement provider call-routing policies, and provide features to users. SIP also provides a registration function that allows users to upload their current locations for use by proxy servers. So-called “Application Servers” (ASs) can be provided in the call path, e.g. within the SIP proxy servers or elsewhere, in order to implement certain functions. SIP runs on top of several different transport protocols. SIP uses the Session Description Protocol (SDP) to specify the medium or media to be used for the session.
A SIP session is typically initiated by a SIP terminal sending a SIP INVITE message to some SIP address. Assuming that a called terminal wishes to accept the invitation, it responds to the calling terminal with a SIP 200 OK messages. The calling terminal responds to receipt of the 200 OK by sending an ACK message to the called terminal. Upon receipt of the 200 OK message (containing the called terminals SDP), the calling terminal can commence sending media to the called terminal. The called terminal can commence sending media upon receipt of the INVITE containing the caller's SDP.
When a calling SIP terminal, which might be referred to as User Equipment (UE) according to 3G terminology or as a User Agent Client (UAC), initiates a call, an AS receives the INVITE request within the SIP control network (this network might be an IP Multimedia Subsystem as defined by 3GPP). Before forwarding the request to its destination (nb. the AS may also choose not to forward the request, depending on the service scenario), the AS may want to play an announcement to the UAC. In order to do this it is desirable to establish an early dialog between the UAC and AS involving the exchange of SDPs, the satisfaction of certain pre-conditions, and a media channel is established. The AS signals its intent to provide early media in a 18x message (where “x” has any appropriate value), the 18x including the appropriate SDP. It is noted that the AS may send multiple 18x messages, each of which may be acknowledged by the UAC with a PRACK message, prior to the AS forwarding the 200 response message to the UAC. When the announcement has been played, the AS may forward the INVITE request to the called UE, or User Agent Server (UAS), in order to continue the session setup. The UAS will then also establish a dialog with the UAC. The signalling associated with this procedure is illustrated in FIG. 1.
The AS may choose to forward the INVITE request while still playing early media, or even before starting to play media (depending upon the service). It may choose to cease the early media when a response (provisional or final) is received from the called party, or when it detects that media is received from the called party. Specific service implementation specifications shall define when early media shall be ceased and, if needed, define which additional mechanisms are to be used to detect media.
According to this approach to handling early media, the 18x provisional response from the AS, and the final response (200 (INVITE)) from the UAS, are received by the UAC as part of the same dialog within the session. The AS must modify the To header tag parameter received in the response message from the UAS, so as to match the tag sent by the AS in the 18x provisional response. Also, since requests (incorporated into SIP messages) may be sent from the AS to the UAC, the AS may have to modify the Cseq value in requests received from the called UE, to make sure the values in the requests forwarded to the UAC are greater than the values in requests possibly sent from the AS to the UAC. The AS will also have to handle issues related to the route set etc (i.e. parameter sets included in the SIP messages). These issues can be solved by the AS acting as a Back-to-Back User Agent (B2BUA).
A further issue which must be addressed is the need to provide two remote SDP answers to the UAC; the SDP for the early media (initiated by the AS), and the SDP from the UAS. The SDP answer cannot change within the same INVITE transaction (i.e. by sending the early media SDP in an 18x message, and the UAS SDP in the 200). There are two different solutions to this problem.
1. After the 200 OK is sent, a SIP UPDATE is sent by the AS to the UAC to provide the UAS SDP. The AS must send this UPDATE, since the UAS has no knowledge of the SDF previously sent by the AS to the UAC. The UPDATE 200 response may contain a changed SDP on the part of the UAC. However, if that is the case, the UPDATE 200 response cannot be forwarded directly to the UAS, since it was the AS that initiated the UPDATE transaction. For that the AS would have to send a separate UPDATE also towards the UAS. The 200 response for that UPDATE, sent from the UAS to the AS may also contain a change in the SDP for the UAS. Once again, the 200 response cannot be forwarded directly to the UAC, and another UPDATE to the UAC is required. This initial steps in this procedure are illustrated in FIG. 2.
The complexity of this procedure, involving as it does multiple interventions by the AS, is undesirable.
2. A second solution relies upon so-called “early media” mechanisms. As defined by the SIP recommendations, “early media” refers to media (e.g., audio and video) that is exchanged before a particular session is accepted by the called user. Within a dialog, early media may occur from the moment the initial INVITE is seat until the UAS generates a final response. Early media may be unidirectional or bi-directional, and can be generated by the caller, the callee, or both. Typical examples of early media generated by the callee are ringing tones and announcements (e.g., queuing status). Early media generated by the caller typically consists of voice commands or dual tone multi-frequency (DTMF) tones to drive interactive voice response (IVR) systems. The basic SIP specification RFC3261 supports only very simple early media mechanisms. RFC3959 extends the original proposal and overcomes a number of problems which might arise when that proposal is implemented.
Use may be made of the early session disposition mechanism described in IETF RFC3959 in order to allow an AS to play an announcement as early media. In this case two separate SDPs are used, one for the early dialog and one for the dialog with the UAS, and in this case the AS offers the early media to the UAC, which the UAC can choose to accept or reject. The initial 18x message sent from the AS to the UAC contains the SDP for the early dialog, whilst the 200 response from the UAS includes the SDP for the UAC to UAS dialog. The early dialog is terminated automatically by the UAC upon receipt of the 200 response. This procedure is illustrated in FIG. 3.
Problems may arise with this approach in the event that other nodes in the network also want to send early media. In any case, the approach has the disadvantage that it requires support for RFC3959 in the user terminals.