Session Initiation Protocol (SIP) is an Internet Engineering Task Force (IETF) defined signaling protocol, used for controlling multimedia communication sessions such as voice and video calls over Internet Protocol (IP). The protocol can be used for creating, modifying and terminating two-party (unicast) or multiparty (multicast) sessions consisting of one or several media streams. The modifications can involve changing addresses or ports, inviting more participants, adding or deleting media streams, etc. Other application examples include video conferencing, streaming multimedia distribution, instant messaging, presence information and online gaming.
The SIP protocol is an IP-based Application Layer protocol. SIP is designed to be independent of the underlying transport layer. SIP can run on Transmission Control Protocol (TCP), User Datagram Protocol (UDP), or Stream Control Transmission Control Protocol (SCTP). SIP is a text-based protocol (e.g., ASCII text encoded). SIP incorporates many elements of the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP).
SIP employs design elements similar to the HTTP request/response transaction model. Each transaction consists of a client request that invokes a particular method or function on the server and at least one response. SIP reuses most of the header fields, encoding rules and status codes of HTTP and providing a readable text-based format.
SIP works in concert with several other protocols and is only involved in the signaling portion of a communication session. SIP clients typically use TCP or UDP on port numbers 5060 and/or 5061 to connect to SIP servers and other SIP endpoints. SIP is primarily used for setting up and tearing down voice or video calls. The voice and video stream communications in SIP applications are carried over another application protocol such as Real-time Transport Protocol (RTP). Parameters (e.g., port numbers, protocols, codecs) for corresponding media streams are defined and negotiated using the Session Description Protocol (SDP) which is transported in the SIP packet body. SIP and SDP are defined in the IETF Request For Comment (RFC) documents 3261 and 4566 each of which are incorporated by reference in their entirety herein.
A SIP user agent (UA) is a logical network end-point used to create or receive SIP messages and thereby manage a SIP session. A SIP UA can perform the role of a User Agent Client (UAC), which sends SIP requests, and a User Agent Server (UAS), which receives the requests and returns a SIP response. These roles of UAC and UAS typically only last for the duration of a SIP transaction. A SIP phone is a SIP UA that provides the traditional call functions of a telephone, such as dial, answer, reject, hold/unhold, and call transfer. SIP phones may be implemented by dedicated hardware controlled by the phone application directly or through a combination of hardware, software and firmware. SIP phones can be any phone with IP connectivity including traditional desktop phones, cell phones, smart phones or Personal Digital Assistants (PDAs), etc.
Each resource of a SIP network, such as a User Agent or a voicemail box, is identified by a Uniform Resource Identifier (URI), based on the general standard syntax also used in Web services and e-mail. A typical SIP URI is of the form: sip:username:password@host:port. The URI scheme used for SIP is sip:. If secure transmission is required a message may be encrypted and a scheme of sips: is used and corresponding messages are transported over Transport Layer Security (TLS).
SIP also defines server network elements as outlined in RFC 3261. A “proxy server” is an intermediary entity that acts as both a server and a client for the purpose of making requests on behalf of other clients. A proxy server primarily plays the role of routing, which means its job is to ensure that a request is sent to another entity “closer” to the targeted user. Proxies are also useful for enforcing policy (e.g., making sure a user is authorized to make a call). A proxy interprets, and if necessary, rewrites specific parts of a request message before forwarding the message. A registrar is a server that accepts REGISTER requests and places the information it receives in those requests into the location service for the domain it handles. The RFC for SIP specifies that it is an important concept that the distinction between types of SIP servers is logical, not physical. In practice, different logical capabilities of SIP can be performed by one server or split across a plurality of physical devices as required by design choices.
As mentioned above, SDP is a format for describing streaming media initialization parameters in an ASCII string. SDP is intended for describing multimedia communication sessions for the purposes of session announcement, session invitation, and parameter negotiation. SDP does not deliver media itself but is used for negotiation between end points of media type, format, and all associated properties. The set of properties and parameters are often referred to as a session profile.
A Session Description is a well defined format for conveying sufficient information to discover and participate in a multimedia session. A session is described by a series of attribute/value pairs, one per line. The attribute names are single characters, followed by “=”, and a value. Optional values are specified with “=*”. Values are either in an ASCII string, or a sequence of specific types separated by spaces. Attribute names are only unique within the associated syntactic construct, i.e., within the Session, Time, or Media only.
FIG. 1A shows a typical network topology (diagram 100) for a SIP based phone environment as may be found in the prior art. Network diagram 100 shows a pair of SIP phones (105, 106) connected to IP network 110 and configured for Voice over IP (VoIP) phone calls. SIP Proxy and SIP Registrar functions are provided by SIP Proxy/Registrar server 120. In this example both of these logical functions have been included with a single server 120, however, these functions may also be implemented on two distinct hardware servers.
FIG. 1B shows a timeline 150 of a typical prior art process of utilizing SIP to signal from a first phone to a second phone to establish a call utilizing example pieces of network 100. Initially (time 155), each phone will register with a SIP Registrar/Proxy server via a REGISTER message. The information for this registration can be preconfigured into the device or each device can be provisioned utilizing a mechanism similar to Dynamic Host Configuration Protocol (DHCP). After, the phones have established their connection to the Proxy/Registrar infrastructure they are each capable of making/receiving phone calls. In timeline 150, phone 1 (105) calls phone 2 (106) by sending (time 160) an INVITE message to the Proxy/Registrar server 120 with the INVITE message addressed to phone 2 (106). The Proxy/Registrar server will interrogate the message and locate/forward the INVITE message toward a network destination “closer” to phone 2 (106). Upon receipt at phone 2 (106), phone 2 (106) will respond with an OK message (time 165) if it is ready and able to accept the phone call. The INVITE message and the OK response include information about the audio capabilities of each of devices 105 and 106 such that a negotiation for a particular type of transmission of data may take place. Phone 1 (105) responds with an ACK message (time 170) to phone 2 (106) indicating how to establish the data transfer communication session for a VoIP phone call as shown at time 175.
Prior art networks such as 100 primarily consist of SIP endpoints configured for a particular function and having hardware components compatible with that particular function. Upgrading of endpoints to support enhanced functionality typically requires replacing a hardware component that is acting as an endpoint. Alternatively, there have been prior art devices which split the audio and video processing between devices, however those devices involve two devices with required embedded information and having a private means of communication and coordination between each of the two devices. Accordingly, it is desirable to provide a method and device capable of augmenting capabilities at an existing endpoint without being required to replace a legacy (or less capable) endpoint device and without requiring a private means of communication and coordination between devices. For example, a SIP audio-only phone (e.g., 105, 106) may be augmented to a video phone while still providing its original audio-only capability by using the methods and systems disclosed herein.