An Internet Protocol (IP) session involves the connection between two devices across a network of routers, cables, and switches for the purpose of exchanging packets of information. For example, a web browser can establish an IP-based HTTPS session with a website for the purpose of retrieving information. In another example, a device can establish a Session Initiation Protocol (SIP) session with another computing device to, e.g., conduct a phone call.
Web browsers have recently begun adopting the Web Real-Time Communication (WebRTC) protocol for the purpose of establishing real-time audio and video sessions between browser clients. The WebRTC protocol as defined by the IETF relies on the ICE (RFC 5245) methods for establishing a direct communication link between the two clients. Under certain network topologies, the only means for a successful communication link is through the use of a media relay server placed out in the network.
For both SIP and WebRTC sessions, ICE defines that media relay server to be a Traversal Using Relays around NAT (TURN) server running the TURN protocol (RFC 5766). The TURN protocol however is susceptible to many types of attacks such as theft of service and distributed denial of service.
The ICE protocol is designed to allow two client devices to automatically discover the best way to send voice and video media streams across an IP network. Certain network topologies such as those using Firewall/NATs can prevent the devices from directly communicating due to the way in which some Firewall/NATs create and enforce IP address and port access. The NAT function automatically maps an external IP address and port to every outbound message stream the client produces. The procedures of ICE allow the client to learn what IP address and port the NAT has assigned. When a client decides to initiate a real-time session, the client must first determine which IP address and port combinations are available for the purpose of establishing a media connection with another client. These IP and port combinations are called “candidates” in the terminology of ICE.
In the simplest model, a client begins its candidate discovery by sending a STUN (RFC 5389) binding request to a STUN server somewhere out in the network. The STUN server responds to the client binding request and provides the IP address and port information of where the STUN server saw the binding request originate from. If the client is behind a Firewall/NAT, the STUN server sees the external IP address and port that the NAT assigned to this outbound message transmission. This is called the Server Reflexive candidate.
According to ICE, a client in need of a network media relay issues a TURN allocation request using a procedure similar to STUN. In addition to providing the Server Reflexive candidate, the TURN request asks the TURN server to allocate a media relay port for the client to use. FIG. 1 depicts a system for establishing a media connection using a TURN service. The system 100 includes a website application server 102 that is connected to a plurality of client computing devices (e.g., Client A 108, Client B 110) via Firewall/NAT devices 106a, 106b respectively. The system also includes a TURN server 104 that is also connected to the client devices 108, 110 via the Firewall/NAT devices 106a, 106b. 
The website application server 102 hands the TURN service credentials to Client A 108 when Client A initiates a call request (e.g., a SIP INVITE, a WebRTC call request) by clicking on a web page link, for example. Client A 108 then issues the necessary resource allocation messages to the TURN server 104 using the TURN protocol. The allocation response from the TURN server 104 contains the ICE Relay candidate, referred to as (r1) in FIG. 1, to Client A 108. Note that the port (t1) is the TURN service port and not the Relay candidate.
In the TURN model, the client device then creates a Session Description Protocol (SDP) of its possible media candidates. In FIG. 1, Client A 108 creates an SDP containing the Host candidate (a1), the Server Reflexive candidate (a2), and the Relay candidate (r1). The SDP is then passed up to the website application server 102 using the mechanisms of the media protocol (e.g., WebRTC, SIP). The website application server 102 then uses the media protocol to send the SDP of Client A 108 down to Client B 110. Client B 110 then initiates its own candidate discovery using STUN. Client B 110 does not attempt a TURN reservation because client B sees that Client A 108 has already offered a Relay candidate and only one relay candidate is needed per media stream.
After the STUN binding exchange, Client B 110 creates its SDP using the Host candidate (b1) and the Server Reflexive candidate (b2). The SDP is handed up to the website application server 102 using the media protocol and the website application server 102 delivers the SDP to Client A 108. Both Client A 108 and Client B 110 now have the other client's SDP and the STUN connectivity checks begin. ICE defines the priority of the various permutations that arise when each client systematically tries to communicate between candidates. Client A 108, for instance, attempts to send a STUN message from its Host candidate to Client B's (110) Host candidate (a1-b1). If Client A 108 does not see a response, then Client A tries to send a STUN connectivity message to the Server Reflexive candidate of Client B 110 and make the attempt occur between (a2-b2), as shown in FIG. 1.
At the same time, Client B 110 performs STUN connectivity checks towards Client A's (108) candidates. If the Host and Reflexive candidates do not succeed, then Client B 110 sends a STUN connectivity check to the Relay candidate (r1). The TURN server 104 then encapsulates the request inside a TURN header and forwards the encapsulated request to Client A 108 using the established TURN binding (a2-t1). Because Client A 108 created that TURN binding, Client A receives the encapsulated STUN connectivity message and responds using the reverse path (a2-t1-r1-b2). Client B 110 then successfully receives the connectivity response and both clients 108, 110 decide to use that connection for the media stream. This entire process must happen for each media stream. For example, a call using audio and video will have to perform the above process twice before exchanging audio and video data.
The TURN server 104 shown in FIG. 1 is susceptible to attack due to a few of its basic design characteristics.                1) The client (e.g., Client B 110) is given the authentication credentials needed by the TURN server 104 in order to “securely” make the Relay candidate reservation. A compromised client can easily learn the credentials and can use them for other purposes such as using the TURN server for a completely different website, thus stealing service from the original website.        2) The TURN server 104 is forced to accept reservation requests from anywhere in the network and must attempt to verify the sender's legitimacy by performing a computation on the provided credentials. This can create a denial of service condition if the TURN server gets flooded with these requests.        3) The TURN server 104 must use the same port interface for its TURN reservations and the resulting media flows. The media must flow over the same binding that created the relay reservation in order to successfully traverse the Firewall/NAT (e.g., Firewall/NAT 106a, 106b). That is why the TURN protocol requires the media flows to be encapsulated inside a TURN packet so that the TURN server 104 can distinguish the difference between the control and media packets.        