This document presents an invention that allows endpoints (using a real-time protocol, for example H.323, SIP or MGCP) located in different secure and private IP data networks to be able to communicate with each other without compromising the data privacy and data security of the individual private networks. The invention relates to a method and apparatus that has the advantage of working with existing security functions, firewalls for example, and NAPT (Network Address Port Translation) functions that may occur in firewalls, routers and proxies. The benefit of the invention is that it saves on the costs of upgrading those devices to be fully protocol (e.g. H.323) compliant or deploying additional protocol aware (e.g. H.323) devices. The invention presented in this document applies to those deployments where simple (1-to-1) NAT (Network Address translation) mapping may be applied at the edge of the private networks and/or to deployments where NAPT (Network Address and Port Translation) is applied at the edge of the private networks. The 2 configurations can coexist and the apparatus can allow communications to take place between private networks following one configuration and private networks following the other configuration. Similarly within a single private network, some terminals may use one configuration (e.g. dedicated room systems) whereas other terminals may use the second configuration (e.g. desktop client PCs). Note that for the purpose of this document NAT will refer to all types of network address translation.
The invention presented in this document is illustrated with reference to the ITU H.323 standard as that is the predominant standard for real-time multimedia communications over packet networks including IP networks. However, it is equally applicable to other standards or methods that need to dynamically assign ports to carry bi-directional information (e.g. IETF Session Initiation Protocol (SIP)). It is a major benefit of this invention that the private network infrastructure (firewalls and routers) need not be aware of the protocol used for real-time communication, and that the method of tunnelling real-time traffic in and out of a private network may also be protocol agnostic. This allows enterprises to deploy apparatus without regard to the protocol. That is not to say that some implementations may provide ‘protocol’ checking for security or other reasons.
The rapidly evolving IP (Internet Protocol) data network is creating new opportunities and challenges for multimedia and voice Communications Service Providers. Unprecedented levels of investment are being made in the data network backbone by incumbent telecommunication operators and next generation carriers and service providers. At the same time, broadband access technologies such as DSL and cable modems are bringing high speed Internet access to a wide community of users. The vision of service providers is to make use of the IP data network to deliver new voice, video and data services right to the desktop, the office and the home alongside high speed Internet access.
The H.323 standard applies to multimedia communications over Packet Based Networks that have no guaranteed quality of service. It has been designed to be independent of the underlying transport network and protocols. Today the IP data network is the default and ubiquitous packet network and the majority (if not all) of implementations of H.323 are over an IP data network. Other protocols for real-time (voice and video) communications, for example, SIP and MGCP also use the IP data network for the transport of call signalling and media. New protocols for new applications associated with the transport of real-time voice and video over IP data networks are also expected to be developed. The methods presented within this invention will also apply to them, and other protocols that require multiple traffic flows per single session.
The importance of standards for wide spread communications is fundamental if terminals from different manufacturers are to inter-operate. In the multimedia arena, the current standard for real-time communications over packet networks (such as IP data networks) is the ITU standard H.323. H.323 is now a relatively mature standard having support from the multimedia communications industry that includes companies such as Microsoft, Cisco and Intel. For example, it is estimated that 75% of PCs have Microsoft's NetMeeting (trade mark) program installed. NetMeeting is an H.323 compliant software application used for multimedia (voice, video and data) communication. Interoperability between equipment from different manufacturers is also now being achieved. Over 120 companies world-wide attended the last interoperability event hosted by the International Multimedia Telecommunications Consortium (IMTC), an independent organisation that exists to promote the interoperability of multimedia communications equipment. The event is a regular one that allows manufacturers to test and resolve inter-working issues.
Hitherto, there had been a number of barriers to the mass uptake of multimedia (particularly video) communications. Ease of use, quality, cost and communications bandwidth had all hampered growth in the market. Technological advances in video encoding, the ubiquity of cheap IP access and the current investment in the data network coupled with the rollout of DSL together with ISDN and Cable modem now alleviates most of these issues making multimedia communications readily available.
As H.323 was being defined as a standard, it was assumed that there would be H.323-H.320 gateways that exist at the edge of network domains converting H.323 to H.320 for transport over the wide area between private networks. Therefore, implementations of H.323 over IP concentrated on communications within a single network.
However, IP continues to find favour as the wide area protocol. More and more organisations continue to base their entire data networks on IP. High speed Internet access, managed Intranets, Virtual Private Networks (VPNs) all based on IP are commonplace. The IP trend is causing H.320 as a multimedia protocol to decline. The market demand is to replace H.320 completely with H.323 over IP. But perhaps the main market driver for transporting real-time communications over IP across the WAN (wide area network) is voice. With standards such as H.323 and SIP users had begun to use the Internet for cheap voice calls using their computers. This marked the beginning of a whole new Voice over IP (VoIP) industry that is seeing the development of new VoIP products that include Ethernet telephones, IP PBXs, SoftSwiches and IP/PSTN gateways all geared at seamlessly delivering VoIP between enterprises and users. H.323, SIP and MGCP are expected to be the dominant standards here.
Unfortunately, unforeseen technical barriers to the real-world, wide area deployment of H.323 and SIP still exist. The technical barriers relate to the communications infrastructure at the boundaries of IP data networks.
Consequently, today, successful implementation of multimedia or voice communications over IP are confined to Intranets or private managed IP networks.
The problems arise because of two IP technologies—Network Address Translation (NAT) and Firewalls. Security is also an issue when considering solutions to these problems. Where deployments of real-time communications over the data networks transverse shared networks (for example the public Internet), enterprises must be assured that no compromise to their data security is being made. Current solutions to these problems require the outside or external IP address(es) of enterprise to become public to anyone with whom that enterprises wishes to communicate (voice communications usually includes everyone). The invention presented herein does not suffer this shortfall as enterprises external IP address(es) need only be known to the ‘trusted’ service provider which is how the public Internet has largely evolved.
NAT has been introduced to solve the ‘shortage of addresses’ problem. Any endpoint or ‘host’ in an IP network has an ‘IP address’ to identify that endpoint so that data packets can be correctly sent or routed to it and packets received from it can be identified from where they originate. At the time of defining the IP address field no-one predicted the massive growth in desktop equipment. After a number of years of global IP deployment, it was realised that the number of endpoints wanting to communicate using the IP protocol would exceed the number of unique IP addresses possible from the address field. To increase the address field and make more addresses available requires the entire IP infrastructure to be upgraded. (The industry is planning to do this with IP Version 6 at some point).
The solution of the day is now referred to as NAT. The first NAT solution, which is referred to as simple NAT in IETF RFC1631, uses a one-to-one mapping, came about before the World-Wide Web existed and when only a few hosts (e.g. email server, file transfer server) within an organisation needed to communicate externally to that organisation. NAT allows an enterprise to create a private IP network where each endpoint within that enterprise has an address that is unique only within the enterprise but is not globally unique. These are private IP addresses. This allows each host within an organisation to communicate (i.e. address) any other host within the organisation. For external communication, a public or globally unique IP address is needed. At the edge of the private IP network is a device that is responsible for translating a private IP address to/from a public IP address—the NAT function. The enterprise will have one or more public addresses belonging exclusively to the enterprise but in general fewer public addresses than hosts are needed either because only a few hosts need to communicate externally or because the number of simultaneous external communications is smaller. A more sophisticated embodiment of NAT has a pool of public IP addresses that are assigned dynamically on a first come first served basis for hosts needing to communicate externally. Fixed network address rules are required in the case where external equipment needs to send unsolicited packets to specific internal equipment.
Today, most private networks use private IP addresses from the 10.x.x.x address range. External communications are usually via a service provider that offers a service via a managed or shared IP network or via the public Internet. At the boundaries between the public and private networks NAT is applied to change addresses to be unique within the IP network the packets are traversing. Simple NAT changes the complete IP address on a one-to-one mapping that may be permanent or dynamically created for the life of the communication session.
Web Servers, Mail Servers and External servers are examples of hosts that would need a static one-to-one NAT mapping to allow external communications to reach them.
A consequence of NAT is that the private IP address of a host is not visible externally. This adds a level of security.
An extension to simple NAT additionally uses ports for the translation mapping and is often referred to as NAPT (Network Address Port Translation) or PAT (Port Address Translation). A port identifies one end of a point-to-point transport connection between 2 hosts. With mass access to the World-Wide-Web (WWW), the shortage of public IP addresses was again reached because now many desktop machines needed to communicate outside of the private network. The solution as specified in IETF RFC 1631, allows a many-to-one mapping of private IP addresses to public IP address(es) and instead used a unique port assignment (theoretically there are 64 k unique ports on each IP address) on the public IP address for each connection made from a private device out into the public or shared network. Because of growth of the Internet, PAT is the common method of address translation.
A peculiarity of PAT is that the private IP address/port mapping to public IP address/port assignments are made dynamically, typically each time a private device makes an outbound connection to the public network. The consequence of PAT is that data cannot travel inbound, that is from the public network to the private network, unless a previous outbound connection has caused such a PAT assignment to exist. Typically, PAT devices do not make the PAT assignments permanent. After a specified ‘silence’ period has expired, that is when no more inbound data has been received for that outbound initiated connection, the PAT assignment for that connection is unassigned and the port is free to be assigned to a new connection.
While computers and networks connected via a common IP protocol made communications easier, the common protocol also made breaches in privacy and security much easier too. With relatively little computing skill it became possible to access private or confidential data and files and also to corrupt that business information maliciously. The industry's solution to such attacks is to deploy ‘firewalls’ at the boundaries of private networks.
Firewalls are designed to restrict or ‘filter’ the type of IP traffic that may pass between the private and public IP networks. Firewalls can apply restrictions through rules at several levels. Restrictions may be applied at the IP address, the Port, the IP transport protocol (TCP or UDP for example) or the application. Restrictions are not symmetrical. Typically a firewall will be programmed to allow more communications from the private network (inside the firewall) to the public network (outside the firewall) than in the other direction.
It is difficult to apply firewall rules just to IP addresses. Any inside host (i.e. your PC) may want to connect to any outside host (a web server) dotted around the globe. To allow further control the concept of a ‘well known port’ is applied to the problem. A port identifies one end of a point-to-point transport connection between 2 hosts. A ‘well known port’ is a port that carries one ‘known’ type of traffic. IANA, the Internet Assigned Number Authority specifies the well known ports and the type of traffic carried over them. For example port 80 has been assigned for web surfing (http protocol) traffic, port 25 Simple Mail Transport Protocol etc.
An example of a firewall filtering rule for Web Surfing would be:
Any inside IP address/any port number may connect to any outside IP address/Port 80 using TCP (Transport Connection protocol) and HTTP (the application protocol for Web Surfing).
The connection is bi-directional so traffic may flow back from the Web Server on the same path. The point is that the connection has to be initiated from the inside.
An example of a firewall filtering rule for email may be:
Any outside IP address/any port number may connect to IP address 192.3.4.5/port 25 using TCP and SMTP.
(Coincidentally, the NAT function may change the destination IP address 192.3.4.5 to 10.6.7.8 which is the inside address of the mail server.)
Filtering rules such as “any inside IP address/any port number may connect to any outside IP address/any port number for TCP or UDP and vice versa” are tantamount to removing the firewall and using a direct connection as it is too broad a filter. Such rules are frowned upon by IT managers.
H.323 has been designed to be independent of the underlying network and transport protocols. Nevertheless, implementation of H.323 in an IP network is possible with the following mapping of the main concepts:
H.323 addressIP addressH.323 logical channelTCP/UDP Port connection
In the implementation of H.323 over IP, H.323 protocol messages are sent as the payload in IP packets using either TCP or UDP transport protocols. Many of the H.323 messages contain the H.323 address of the originating endpoint or the destination endpoint or both endpoints. Other signalling protocols such as SIP also embeds IP addresses within the signalling protocol payload.
However, a problem arises in that NAT functions will change the apparent IP addresses (and ports) of the source and destination hosts without changing the H.323 addresses in the H.323 payload. As the hosts use the H.323 addresses and ports exchanged in the H.323 payload to associate the various received data packets with the call, this causes the H.323 protocol to break and requires intermediary intelligence to manipulate H.323 payload addresses.
Because of the complexity of multimedia communications, requires several logical channels to be opened between the endpoint. Logical channels are needed for call control, capabilities exchange, audio, video and data. In a simple point-to-point H.323 multimedia session involving just audio and video, at least 6 logical channels are needed. In the IP implementation of H.323, logical channels are mapped to TCP or UDP port connections, many of which are assigned dynamically.
As the firewall functions filter out traffic on ports that they have no rules for, either the firewall is opened, which defeats the purpose of the firewall, or much of the H.323 traffic will not pass through.
Therefore, both NAT and firewall functions between endpoints prevent H.323 (and other real-time protocols, SIP and MGCP for example) communications working. This will typically be the case when the endpoints are in different private networks, when one endpoint is in a private network and the other endpoint is in the Internet or when the endpoints are in different managed IP networks.
H.323 (and SIP, MGCP etc.) communication is therefore an anathema to firewalls. Either a firewall must become H.323 aware or some intermediary intelligence must manipulate the port assignments in a secure manner.
One possible solution to this problem would be a complete IP H.323 infrastructure upgrade. This requires:                H.323 upgrade to the NAT function at each IP network boundary. The NAT function must scan all H.323 payloads and consistently change IP addresses.        H.323 upgrade to the firewall function at each IP network boundary. The firewall must understand and watch all H.323 communication so that it can open up the ports that are dynamically assigned and must filter all non-H.323 traffic on those ports.        Deployment of H.323 intelligence at the boundary or in the shared IP network to resolve and arbitrate addresses. IP addresses are rarely used directly by users. In practice, IP address aliases are used. Intelligence is needed to resolve aliases to an IP address. This H.323 function is contained within H.323 entities called Gatekeepers.        
The disadvantages of this possible solution are:                Each organisation/private network must have the same level of upgrade for H.323 communication to exist.        The upgrade is costly. New functionality or new equipment must be purchased, planned and deployed. IT managers must learn about H.323.        The scale of such a deployment will likely not be readily adaptable to the demands placed on it as the technology is progressively adopted, requiring a larger and more costly initial deployment than initial (perhaps experimental) demand requires.        The continual parsing of H.323 packets to resolve the simple NAT and firewall function places a latency burden on the signal at each network boundary. The latency tolerance for audio and video is very small.        Because there are a multitude of standards for real-time communication and each of the signalling protocols of those standards are different, an enterprise would need multiple upgrades—one for each protocol it wishes to use.        The media is expected to travel directly between enterprises or between an enterprise and a device in the public network. The consequence of this is that the IP addresses of an enterprise become public knowledge. This is regarded as a security compromise as any potential attacker must first discover the enterprises IP address as the first step to launching an attack.        
As a result of these problems, the H.323 protocol is not being used for multimedia communications when there is a firewall and/or network address translation (NAT). One approach has been to place H.323 systems on the public side of the firewall and NAT functions. This allows them to use H.323 while also allowing them to protect the remainder of their network. The disadvantages of this are:                1. The most ubiquitous device for video communications is the desktop PC. It is nonsensical to place all desktop computers on the public side!        2. The H.323 systems are not protected from attackers on the public side of the firewall.        3. The companies are not able to take advantage of the potentially ubiquitous nature of H.323, since only the special systems will be allowed to conduct H.323 communications.        4. The companies will not be able to take full advantage of the data-sharing facilities in H.323 because the firewall will prevent the H.323 systems from accessing the data. Opening the firewall to allow data-transfer functions from the H.323 system is not an option because it would allow an attacker to use the H.323 system as a relay.        5. In the emerging Voice over IP (VoIP) market there is a market for telephony devices that connect directly to the data network, for example Ethernet telephones and IP PBXes. By virtue of the desktop nature they are typically deployed on the private network behind firewalls and NAT. Without solutions to the problems described above telephony using these devices is confined to the Enterprises private network or Intranet or must pass through IP-PSTN gateways to reach the outside world.        
The advantages of using the broadband connection to the enterprise for voice and video as well as data require secure solutions to these issues.