Typical Internet applications use a client/server model according to which the client connects to a well-known port number on the server. A “client” in this regard is a program that initiates a network connection, or the host that the program runs on. A “server” in this regard is a program that processes requests from a client connecting to it, or the host that the program runs on. It is typical for at least some port numbers to be publicly posted as belonging to specific applications. For example, it is well known that for conventional Web traffic, the Web server application listens on port 80, and that the SIP server application listens on port 5060.
In VoIP and other RTS applications, however, it is common for the signaling session to be distinct from the bearer session. (The bearer session may also be referred to as the “media” session.) The signaling session conforms to the conventional client/server model. However, in contrast to conventional applications, the parameters for the bearer connection are established in the signaling session. Thus, in particular, the port number for the bearer session may not be known a priori.
For purposes of security and economy, many local communication networks connect to the Internet through an intermediary device such as a firewall (FW) or a Network Address Translation box (NAT). Herein, we refer collectively to such intermediary devices as “middleboxes.” A firewall is useful for, among other things, preventing attacks on the local network from outside. A NAT is useful for pooling a limited number of available IP addresses among a larger number of users, and for protecting the identities of individual users inside the local network. Often, the NAT and the firewall occupy the same physical box.
One consequence of separating the VoIP signaling session from the VoIP bearer session is that bearer traffic incoming to the local network from a remote host may be blocked. A “host” in this regard may be any computer that connects to the Internet or to a local intranet.
For example, a firewall will generally permit a local host to open a connection to an outside server and permit the reply to an authorized connection. When a connection request from a local client is let out through the firewall, the firewall will record the ports and addresses used to identify the reply that is coming into the local network from the outside.
In VoIP, however, the local endpoint does not initiate the bearer traffic stream that it is to receive. As a consequence, the firewall does not have the information needed for it to recognize the remote host's traffic, and will therefore block it.
More specifically, for the incoming VoIP bearer stream, the local endpoint will randomly choose a port for the remote endpoint to connect to. The local endpoint will send the identifying number of the randomly chosen port to the remote endpoint in a signaling message. However, because the incoming and outgoing bearer streams are independent, the firewall has no way to determine which of the local ports was chosen to receive the incoming bearer stream. As a consequence, the firewall will not recognize the traffic coming in to that port as legitimate traffic, and will therefore have to block it.
Moreover, in a transmission that conforms to the well-known Real Time Transport Protocol (RTP), the endpoint chooses a random port number. As a consequence, the local administrator will not know, in configuring the firewall, which ports to admit through the firewall for the requested connections.
The VoIP protocols for establishing an RTP session also conflict with some features of NAT-based address translation. The VoIP protocols inherently assume that when the endpoint advertises an IP address and a port, the address will be routable from the remote host and the port will be left unmodified by the network. These assumptions will generally be false if NAT is implemented. In such a case, the endpoint IP address will typically be a protected, private address. Routing directly to such an address from outside the NAT will generally be forbidden. Instead, the NAT will remap the port number and the IP address.
For the above reasons, problems may be encountered when VoIP, as well as other real time services (RTS) such as video and Instant Messaging, attempt to traverse a middlebox such as a NAT or firewall device.
Some methods have been proposed for overcoming the problems described above, and thus for traversing a firewall or NAT. One such method uses an Application Layer Gateway (ALG). The ALG inspects every signaling packet that passes through the NAT box for address or port information. For example, in SIP, this information is included in the Session Description Protocol (SDP) body. Once this information is obtained, new address and port bindings can be obtained for the media stream, and the signaling message can be rewritten with a new, public address and port. However, such methods based on ALGs suffer from certain disadvantages. For example, the ALG needs to be reprogrammed to support each new protocol that is invoked, such methods make intensive use of network resources, and protocols that use cryptography, such as S/MIME encryption or message integrity protocols, are not supported.
Other methods rely on network elements by means of which the endpoints probe the network to determine the specifics of the NAT behavior. Such approaches are disadvantageous because, among other things, they require a relatively large amount of messaging and a relatively large amount of computational resources at the endpoints.