This invention relates to the field of interactive communications over packet networks.
The Session Initiation Protocol (SIP), currently under development within the Internet Engineering Task Force (IETF), provides a standards-based mechanism for initiating, modifying, maintaining, and terminating interactive multimedia communications sessions over packet networks. SIP is typically used to establish and maintain an end-to-end session for media such as audio, video, and messaging. SIP typically operates at the beginning of a media transmission session as a preliminary setup phase—a dialog of messages is exchanged between a caller and callee. The SIP setup portion establishes policies for the media session, typically including the type of media for the session (audio, video, etc.), the relationships between media (for example, that the audio is synchronized with the video), the transport protocol for the media, the network protocol for the media, such session properties as destination, compression algorithms and parameters, and quality-of-service determinations, to trade off cost, bandwidth, quality, packet loss rates, latency, and similar characteristics. In most cases, a caller and callee must agree on these session policies during SIP setup if the media are to be successfully transmitted. After the session is established, the media are typically transmitted by another protocol. SIP is described in J. Rosenberg et al., “SIP: Session Initiation Protocol,” Internet Engineering Task Force, IETF RFC 3261.
While SIP is broad in its capabilities, in some cases, and especially in early drafts, the functions were provided in a less-than-ideal manner, or were missing altogether. For example, early implementations of SIP could not handle “early media” (“Early media” includes data exchanged between a caller and callee before a call is answered, or before call setup is completed, for example, in-band ringing, alerting, or network announcements where a caller hears audio or video before the call is set). SIP fails to always ring all phones when a user has multiple devices. SIP cannot handle negotiation of the case where a phone can support many compression algorithms for voice, but only one at a time. SIP early media is described in a number of sources, including IETF draft draft-rosenberg-sip-early-media, and a number of archived IETF emails.
SIP Proxy servers (also known as “proxies”) provide call routing, authentication and authorization, mobility, and other signaling services that are independent of the session. In effect, proxies provide signaling policy enforcement. However, the SIP specification allows a proxy to modify only packet headers (SIP precisely defines which components of a packet are “header” and which components are “body”); while a proxy may read packet data, including policy information in SIP message packets (though this is frowned upon), SIP forbids a proxy from modifying packet bodies. In some cases, however, a proxy may wish to set or enforce session policy. Accordingly, the proxy is responsible for enforcing policy that it is unable to affect within the protocol.
Solutions have generally involved breaking the protocol, by allowing proxies to examine and modify packet bodies. Such protocol violations decrease reliability and flexibility.
A “caller” and “callee” are, respectively, the devices that originate a call (“call” is used generically to indicate any communications session supported within a given protocol, and is not limited to plain old telephone service), and the device to which the call is placed or with which the communications session is requested. In some cases, a caller or callee may delegate call setup tasks to an associated agent device, or another device may act as a forwarding agent for the caller or callee.
A “UA” is a “user agent,” typically an endpoint node of a network, including callers and callees. A “UAC” (user agent client) is a UA that issues a request, and a “UAS” (user agent server) is a UA that receives and acts on a request. The role of UAC and UAS is determined on a request-by-request basis. During call setup, the caller issues most of the requests to the callee; thus, the caller is the UAC for these requests, and the callee is the UAS. The callee may initiate some requests, for example, requests to modify some policies originally requested by the caller, or requests to terminate the call; for these requests, the callee is the UAC and the caller is the UAS.
The term “SDP” (Session Description Protocol) refers to a protocol representation of the total collection of policies for a given session.
The SIP protocol is generally directed to call setup, that is, the initial handshaking between caller and callee to establish a call. Call setup typically begins with an INVITE message being sent from caller to callee. The INVITE message indicates to the callee that the caller would like to initiate a session, and gives some description of the session that the caller proposes. For example, an INVITE message typically includes, among other things, the SDP that the caller proposes. The caller and callee exchange one or more messages, until both agree on the policies to be used for the call. At the conclusion of call setup, a call is established. The SIP documents refer to the data transmitted in the call (whether that data be voice, video, binary data, etc.) as “media.” The term “media” will be used in this document to refer to such data, or similar data transmitted by other protocols.
SIP manages two kinds of state between endpoints. “Dialog state” managed by SIP includes the state needed for SIP itself to operate, including, for example, “is the call up or down,” “what is the most recent SIP sequence number I received from the other side,” “what is the call ID,” and so on. Dialog state becomes essentially irrelevant at the end of the SIP setup phase, as the media session begins. SIP is also responsible for managing and establishing “session state,” state that is maintained in the endpoints to control end-to-end media sessions established by SIP, but not used by SIP itself. Session state includes things like the codecs in use, the packetization delay to use, the network quality-of-service parameters, and the like.
A “loose routing protocol” is a directive that requires a packet, or stream of packets to flow thorough a given set of nodes in the network. The routing is “loose” in that it does not specify every single node to be traversed, merely some set of nodes that must be traversed, leaving the remainder of the path to be routed by other routing mechanisms.