SIP is a signaling protocol that handles initiation, modification and termination of multimedia sessions among two or more parties. SIP is a protocol created by IETF (Internet Engineering Task Force) for facilitating multimedia sessions (such as voice calls) over IP networks. It is a part of the multimedia conference architecture of IETF.
SIP is a TCP/IP application layer protocol, in a same way as HTTP (Hypertext Transfer Protocol) and SMTP (Simple Mail Transfer Protocol). As other TCP/IP protocols, SIP is independent from the physical network structure
SIP is used to initiate multimedia sessions, but the protocol itself does not describe the characteristics of the medias used within the session. That is done by another protocol, SDP, which SIP merely carries as payload. IETF SIP specifications define the conventions for SDP usage within SIP. In addition 3GPP (Third Generation Partnership Project) has specified own conventions for SDP usage in SIP, which currently differ somewhat from IETF. The main difference is that according to the IETF the negotiation always follows offer-answer model, where SDP is sent once for each direction, and the attributes can be unidirectional (i.e. e.g. different codecs in different directions). In 3GPP it is possible that a third SDP message in the direction of the first one (offer-answer-final offer) is sent, and attributes are usually bidirectional (i.e. e.g. the same codec in both directions a mandatory requirement).
The media attributes described and negotiated by SDP include media types (audio, video, text etc.), codecs and their characteristics and the transport addresses where media should be sent. For example audio and video are usually transmitted over RTP, and for that payload type number, UDP port number and IP address are needed. SIP/SDP messages usually traverse several proxy elements (IP hosts) for routing and service control purposes, while the media itself is send directly between the two communicating endpoints (IP hosts) whenever possible.
FIG. 1 shows a simplified example of a network architecture, describing signaling (SIP/SDP) and media streams (RTP) among network elements. When subscriber A (8) wants to invite subscriber B (9) to create a session between them, the signaling (1) between A and B usually needs to traverse via at least one SIP proxy (21). The SIP proxy provides name resolution and user location among other things, and communicates (2) with subscriber B as well. After the necessary signaling information has been exchanged between A, B and the SIP proxy, a media (3) path can be created between A and B. The path carries one or more media streams (such as video, voice). Each stream is preferably composed of RTP packets. As the media streams are sent directly from A to B (and vice versa), A and B need to support a common codec for each media to be able to understand each other. SDP is used to negotiate theses capabilities.
However, if there is no common codec among the parties, transcoding of the streams is needed between A and B. The transcoding is performed by a transcoder (IP host) (7) somewhere in the network. A suitable codec has to be negotiated (4) between A and the transcoder, and between B and the transcoder. This also means that there is one RTP session (5) between A and the transcoder, and another RTP session (6) between B and the transcoder.
FIG. 2 shows an example of a known solution of assigning a transcoder between two parties, A and B. The calling party A (8) sends an INVITE message (22) containing SDP toward the SIP proxy (21). SDP contains the list of the codecs supported by A for each media for this session. The SIP proxy forwards (23) INVITE with SDP unmodified toward the called party (B). If B does not support any of the codecs described in the SDP it receives, it generates an error response toward the SIP proxy (24). When the SIP proxy receives the error response, it knows that a transcoder is needed and starts the process. It can put A party on hold (25), and send a new INVITE with SDP (27) containing transcoder address and codecs toward B. Naturally, A acknowledges (26) the hold. If B answers with 200 OK (28), a new INVITE (29) can be sent toward A offering the transcoder address and codecs as well. Naturally, the SIP proxy acknowledges (210) the 200 OK message, and A sends a 200 (SDP A) message (211) to the SIP proxy, which acknowledges (212) it. As an end result, both A and B have media streams to transcoder negotiated, and both have a signaling relation to SIP proxy.
The problem of this solution is that it generates an additional signaling roundtrip (the error response (24), and a new INVITE (29)), which-causes additional delay to the call setup. Especially in a wireless network environment those kind of delays can be unacceptable. The goal of this invention is to eliminate this problem. This is achieved in a way described in the claims.