Voice over Internet Protocol (VoIP) refers generally to the delivery of voice or other media via a data network, such as the Internet or other packet-switched network. For example, Session Initiation Protocol (SIP) is an application-layer control (i.e., signaling) protocol for creating, modifying, and terminating voice or other data sessions between two or more participants. These sessions may include Internet-based telephone calls, multimedia distribution, multimedia conferences, instant messaging conferences, interactive voice response (IVR) systems, automated and manual operator services, automatic call distribution, call routing, etc.
SIP invitations or INVITES may be used to create sessions and may carry session descriptions that allow participants to agree on a set of compatible media types. SIP may use proxy servers to help route requests to a user's current location, authenticate and authorize users for services, implement provider call-routing policies, and/or provide other features to users. SIP may also provide a registration function that allows users to update their current locations for use by proxy servers.
Challenges exist in providing secure systems for establishing VoIP real time communication sessions. Because sessions typically involve one or more intermediary devices (e.g., proxy servers, session border controllers, firewalls, etc.), it has proven difficult to effectively secure (e.g., encrypt) the data associated with a session while simultaneously ensuring that the secure session will be supported by the underlying network. For example, although tunneling protocols (e.g., Internet Protocol Security (IPsec)) exist for securing data between tunnel endpoints, any established tunnel may effectively mask the type of underlying communication from the operators of the network being used. Although this may be desirable in some circumstances, real time communication sessions typically rely on various quality of service (QoS) guarantees provided by the network. When the type of communication is hidden, such QoS guarantees may not be available, resulting in an unacceptable level of performance for the session. Alternatively, hop-by-hop security protocols (e.g., transport layer security (TLS) and secure sockets layer (SSL)) may be implemented that require data to be decrypted at each hop in a network between session participants. Unfortunately, this requirements substantially impacts both the security of the underlying data as well as the efficiency in which it is delivered.