The internet may be loosely described as a public network, or a collection of public networks, consisting generally of a collection of distributed IP devices, e.g., PCs, printers, routers, servers, etc., each of which have a distinct IP address, e.g., 1.1.1.1, 1.1.1.2, . . . 1.1.1.10. See, e.g., FIG. 1. There also exist many private networks which themselves comprise collection of IP devices, each having a discrete IP address which theoretically may be identical to the IP address for a completely different device located on a different private network or located on a public network. Private networks are often connected to the public networks through a gateway (router or proxy). Devices on one private network can communicate or “address” a device on a public network through the gateway that acts as a proxy to the public network. If a device on a private network knows the IP address of device in the same private network or public network, the first device can directly address the second device using the above principle.
Accordingly, in order to address a device on the internet, it is necessary to know its IP address. The websites that make up much of what is considered by the public to be “the internet” reside on web servers which have addresses on a public network (a public IP address). By contrast, most users access the internet through PCs residing on a private network which provides a gateway to the public networks (these gateways are commonly referred to as Internet gateways and carry two IP addresses, one on the private network and the other on a public network). While these devices on private networks are described above as having IP addresses, these IP addresses are usually private (they are not addressable by users outside of the private network on which it resides) and dynamic (the IP address for a particular device on a private network is assigned automatically by a DHCP (dynamic host configuration protocol) server in order to reduce network management overheads created by conflicting IP addresses).
Both the browser in a user's PC and a website residing on a web server have port numbers for receiving communications into the correct application that runs in the device. However, in contrast to the private and dynamic port numbers assigned to browsers and other client applications in a user's PC, website port numbers are public and well known. The port number for websites on the http network is 80; the port number for websites on the https is 443; and the port number for websites on the FTP network is 21. Accordingly, when a user's PC browser sends a request to a website, it uses the website's IP address together with the appropriate port to address the website and accompanies the message with the PC's temporary dynamic IP address as well as the transient port number that corresponds to the browser which generated the request. The website, in responding to the request, directs the response to the dynamic IP address and port number that accompanied the request. See, e.g., FIG. 2.
The way that PCs with dynamic IP addresses on a private network can receive information from a website web server on a public network may be described as follows: the user types into a browser application running on his PC the public IP address for a website web server. The user's PC then directs a request to the website server at the specified address and well-known port number. The request generated by the user's PC includes its return address (its private dynamic address) and a “port number”, which is also known as “TCP socket,” that will be open for a short specified time in order to receive the reply. The TCP socket identifies the particular application on the user's computer to which the reply should be directed—any one user may have several browsers open on his PC, using each browser to communicate with different websites.
Although a device in the private network is able to address a device in the public network through a gateway (which serves as a proxy for the public network), a website cannot address the PC in the private network due to the fact that private IP is not a ‘global address’ and it is not addressable from the public network. To enable the website to send the reply back to the private PC, the gateway manages a temporary network address translation table (NAT) which gets built dynamically as communication takes place. When a request from private network travels to the public network via the gateway, it forwards the request to the website with its own public IP address and a randomly picked port number as the return socket. Also, the gateway adds an entry to the NAT to map that socket to the return IP address and port number of the browser or other client application that generated the request. When the reply is received from the website, the gateway forwards it to the client application the generated the request after performing a lookup in the NAT. Once the communication is completed (TCP connection is closed), the related entry in the NAT is removed; therefore, no more data packets can travel into the private network. If someone attempts to initiate a TCP communication into the private network, the gateway does not allow it by virtue of the fact that there is no address mapping in the NAT (unless it has been specifically setup to act as a reverse proxy as described below).
In this fashion, a public website can always be addressed by a client application on a private network, while a private user's PC cannot be addressed from the Internet.
As discussed above, websites traditionally reside on “web servers” which have a static public IP address on a public network. Alternatively, the web server may be represented on the public network by a router or “reverse proxy” which directs inquiries to the web server which may be placed on a private network. See, e.g., FIG. 3. In this case, the proxy or router will map its public IP address to the private address of the web server, and redirect inquiries coming into the public IP address to the web server address. This method is also known as “reverse proxy” or “IP forwarding” or “protocol tunneling” with slight variation in implementation. In either case, the principle is to forward the request coming from a client application (or an Internet gateway of another private network) on the Internet to the web server.
Websites, in contrast, by virtue of the fact that they have public addresses, are subject to unauthorized access. As mentioned above, websites typically reside on a web server and comprise two primary functional units: a listening unit and a responder unit. The listening unit, which maintains an open line of communication with the public network and receives requests for information from other devices (users) located on the public network or on private networks with access to the public network. The responder unit contains the ASP pages, CGI applications, etc., in effect defining that information which is to be published or made available for publication. When the responder receives a request for information, the responder typically accesses a memory, for example via a database application, containing public and sensitive information. Many websites also have access to private “source data” which may be used to generate the public or sensitive information for publication to authorized users. The responder unit serves as the gateway for determining which requesting devices are entitled to sensitive and/or public information in the website. The responder unit is typically designed so as not to give away, or “publish,” the private source data. Rather, it only uses the private source data to generate the public and/or sensitive information which is then published to authorized users via the website. In order to prevent unauthorized access to sensitive information and private source data that available to a website responder unit, network engineers design “firewalls” which will attempt to identify instances of unauthorized access to the website's data sources. This is primarily done by blocking outside users or devices from initiating TCP/IP connections into the protected network through specific ports (known as “blocking incoming ports”). However, firewalls cannot fully close all incoming ports into the website because certain ports must remain open for the web server to function. Further, a firewall only blocks the initialization of a TCP connection (at the beginning of the TCP conversation) by inspecting traffic that targets a specific port. After the initialization, the traffic has to pass through the firewall in both directions with randomly assigned ports, and the firewall has to allow it to happen. Therefore, unlike in the case of a proxy, a firewall is unable to isolate the network from unauthorized incoming traffic. Each call comes from a ‘visitor’ of the Internet, which could potentially be a hacker.