A. Field of the Invention
The principles of the invention relate generally to packet transmission, and more particularly, to transmission of multimedia related packets across multiple security zones.
B. Description of Related Art
With the increasing ubiquity of the Internet and Internet availability, there has been an increasing desire to leverage its robust and inexpensive architecture for voice telephony services, commonly referred to as voice over IP (internet protocol), or VoIP. Toward this end, standards for internet telephony have been promulgated by the both the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) in the form of H.323 rev 5 (2003), “Packet based multimedia communications systems” as well as the Internet Engineering Task Force (IETF) in the form of RFC 3261 (2002), “Session Initiation Protocol (SIP)” to enable set-up and teardown of the media sessions.
Under each of these standards, a session initiation message is initially routed between a calling party and a proxy server or gatekeeper (collectively, “proxy server”). The proxy server performs call processing, number lookup, routing, and any other required processing of the session initiation message. The session initiation message also typically includes a session description portion that contains information about the media that the caller wishes to use for the session. The proxy server then forwards the session initiation message to the called party (sometimes via redirect servers or other intermediary entities). In response to the received invitation message, a response message having a similar session description portion may be returned to the calling party via the proxy server. When the calling party receives the response message, it forwards an acknowledgement message to the called party. This completes call setup and enables subsequent exchange of real-time media directly between the calling and called parties.
All of the messages exchanged are typically in the form of a packet of data having both header and payload information. Most forms of signaling information are contained in packet headers, while information relating to the media being exchanged between the parties is typically contained within the payload portion. In addition, addressing information, such as Internet Protocol (IP) addresses, Uniform Resource Locators (URL's), Uniform Resource Identifiers (URI's), or user datagram protocol (UDP) addresses, etc. for both the calling and called parties may be contained in both the header and payload. The existence of addressing information in packet payloads has caused difficulties with respect to both firewall and network address translation (NAT) implementation.
FIG. 1 is a generalized block diagram illustrating the main components of a VoIP system 100. Generally speaking, there are two main components in most VoIP networks: network servers or gatekeepers 102 and user agents 104 and 106. Each user agent or user device 104,106 is an end-user device or system that operates on someone's behalf to either place or receive a call. Although the caller is sometimes referred to as a user agent client (UAC) (i.e., the requesting party) and the recipient is sometimes referred to as a user agent server (UAS) (i.e., the responding party), most user agent devices incorporate both UAC and UAS functionality. There are two different types of network servers 102 as well: a proxy server, which receives requests, determines which server to send it to, and then forwards the request; and a redirect server, which receives requests, but instead of forwarding them to the next hop server, tells the client to contact the next hop directly.
Using these main components, the steps in initiating a VoIP session are generally straightforward. As shown in FIG. 1 user agent 104 initially sends an invitation request to network server 102, which in this case is a proxy server. Proxy server 102 will look in its database to determine where to send the invitation request and forward the request to the appropriate next hop, which in this case is user agent 106. It should be understood that, although FIG. 1 illustrates proxy server 102 connecting directly to user agent 106, in practice there could be any number of hops between proxy server 102 and user agent 106. Once the invitation message reaches user agent 106, user agent 106 may respond with an OK message, indicating that it has accepted the invitation to participate in the call. This OK message is then forwarded to user agent 104 via proxy server 102. When user agent 104 receives the OK message, user agent 104 responds with an acknowledgement message, which, when received, starts the session between the parties.
In most modem network environments, firewalls constitute the main protection mechanism for keeping unwanted traffic away from a private network. In general, a firewall is positioned between the private network and the public network such that all traffic passing between the two networks first passes through the firewall. The traffic may then be subjected to various filtering policies which identify the types and sources/destinations of traffic permitted to flow based upon information contained within the packet headers. One exemplary filtering policy may permit all outgoing traffic (e.g., to any destination address) from IP address 134.138.29.17 (the source address) on port 8080 (the source port). Conversely, incoming traffic to 134.138.29.17 on port 8080 may not be permitted unless initially requested by 134.138.29.17. By enabling the enforcement of these various policies, only known and identifiable types of network traffic may be allowed to enter or exit the private network, thereby providing security to the network.
Although most firewall devices support only two distinct security zones, a public or UNTRUST zone and a private or TRUST zone, several firewall providers offer three or more security zones, with a third zone sometimes referred to as a demilitarized zone or DMZ. Often, firewall DMZ's will be implemented for server type devices (e.g., web servers, mail servers, etc.) which, by necessity, must be available to the public network. In addition, additional security zones may be established each having a unique security profile.
Unfortunately, it is the rigorous and strict nature of most conventional firewalls themselves that typically prevents successful establishment of VoIP sessions. For example, addressing information relating to the media exchange between parties is typically contained with the session description portion of a VoIP packet's payload. For example, in a SIP session, addresses and related port(s) on which media is expected is included within the session description protocol (SDP) information found in the message's payload. This information is dynamically assigned upon generation of the each message and cannot be adequately predicted by the firewall. Accordingly, when media from either party is received at the firewall, its passage is denied because no enabling policy is identified. The alternative to blanket denial is to leave a wide range of ports unprotected to facilitate passage of the media. Clearly, this is untenable from a security standpoint. To remedy this issue, intelligent Application Level Gateways (ALG) may be implemented on the firewall which identify VoIP messages as they are received at the firewall. The VoIP messages are then parsed for information contained within their headers and payloads. This information may then be used to create gates or “pinholes” in the firewall interfaces which enable the media to be exchanged between the parties. A pinhole is typically defined to allow traffic based on source and destination addresses and ports.
In addition to problems posed by the restrictive nature of firewalls alone, many firewalls also implement NAT. Generally speaking, NAT is a technology for enabling multiple devices on a private local area network (LAN) having private IP addresses to share a single, or pre-defined group of public IP addresses. Because the private IP addresses maintained by the devices are not routable from outside of the LAN, the NAT must perform translation between the private and public IP addresses at the point where the LAN connects to the Internet.
In operation, when a device on the LAN wishes to initiate a connection with a device outside of the LAN, the device will send all traffic to the NAT first. The NAT examines the header of each outgoing packet and replaces the source or return address contained therein, which is the device's private address, with it's own public address before passing the traffic to its destination on the Internet. In some implementations, port translation is also provided, enabling the NAT to also modify the source and return ports on the traffic. These translations are stored in a table for use in identifying recipients for received traffic. When a response is received, the NAT queries the NAT table, identifies the proper recipient and passes the response to that device.
Unfortunately, as discussed above, addressing information for VoIP traffic may be contained within the payload information as well as the header of outgoing packets. Accordingly, conventional NATs fail to accurately translate all outgoing traffic, resulting in dropped or discarded failed connections. To remedy this deficiency, the ALGs described above may be configured to translate information contained within the payloads as well as the headers of VoIP messages. Unfortunately, current ALGs fail to support scenarios involving more than two distinct security zones where call messages are routed through multiple zones between calling parties.
Accordingly, there is a need for a VoIP routing solution that enables call setup and media exchange across multiple security zones.