The Internet is a worldwide IP network that links many different organizations. The Internet is not a centralized organization but a collection of different networks from various sources, governmental, educational and commercial. Internet routing is done by many Internet providers, government departments and private service companies who establish connections among themselves and build the base of the network. Organizations and individuals connected to the Internet are usually bound to one provider and so may communicate with any other connected organization and individual across the inter-provider routes that are made of expensive communications lines often referred to as ‘peer lines’. To cope with the explosion of the Internet over the past years a rapid expansion in bandwidth and other resources deployed by ISPs has been required. To contain their operational costs, ISPs have adopted the use of proxy caching which can significantly reduce bandwidth costs by locally retaining highly used information rather than accessing it from a remote server, through an expensive link, each time it is requested by an end-user (ISP's customers and users). The caching proxy function is also beneficial to the end-user who may thus enjoy good response time. The function is carried out by a proxy server, a Web server, which takes over the responsibility of retrieving Internet data for multiple browser clients. Client requests are sent to the servers through the proxy. In other words, the client has to be configured to send its request to the proxy first, and then it is the proxy that forwards the client's request to the server, acting on behalf of the originating client. The remote Web server does not even see the IP address of the client in the packet headers, but only the IP address of the proxy server. Once the proxy receives the information from the server, it forwards the information to the requesting client. In this way the proxy function can be used to provide address security and optionally, through specific proxy features, to support additional functions, such as request filtering or modification that the service provider may want to implement.
Thus, a traditional proxy server receive requests for URLs (Uniform Resource Locator) from clients and then forwards them to the destination Web server. Those of the retrieved Web documents that are considered to be cacheable according to the Hypertext Transport Protocol (HTTP) are saved. The proxy server can then serve subsequent requests for cached documents from its local cache. Clients get the information faster and network bandwidth utilization is reduced.
Although the proxying technique is advantageous both for the Internet provider, which can thereby limit its bandwidth requirement on peer lines (while the number of Internet users is exponentially growing), and for the clients who get a better response time, it has created problems of two kinds. Firstly, as mentioned here above, the origin IP address of the 30 client is lost in the packet headers received by the servers since the proxy acts as a relay between them. Thus, the traceability of the exchanges is impaired. This may become a serious problem if a wrong doer, a hacker, is attempting to attack a site or tries to disseminate a virus. In such case, the Web site or the end-addressee of a mail, which has been subject to an attack, can only be aware of the proxy address as the origin of the malicious IP packets. This may not be much help if the ISP from which packets have been originated is hosting thousands and sometimes tens or hundreds of thousands of clients. Secondly, having a proxy assumes that the client browsers are personalized for that proxy, the users become proxy-aware, which poses serious scale ability problems when a successful provider wants to grow which, if typical, suggests a growth number in the range often percent (10%) a month. Configuring and re-configuring the end-user browsers can become a cumbersome and costly task that may have severe adverse commercial impacts and, in any case, contributes significantly to increase the administrative cost of managing a network.
As a consequence transparent proxying has been introduced. This technique implicitly assumes that there is a single gateway (or at least a limited number of them) through which all the clients connected to an ISP network or all the users on an intranet are bound to pass through to access the Internet. In practice this assumption holds. For instance, proxy caches, discussed above, need to be placed at gateways to be efficient and other considerations like security tend to limit the access of a sub-network to a single point so it is convenient to watch the traffic flow in both directions. Then, transparent proxying manages to redirect all client sessions passing the gateway to local proxy servers in a fully transparent way. Clients (both users and software i.e., client browsers) do not know their session is handed over to a proxy process: they still think they have a direct connection with the target they specified. To achieve this, transparent proxying relies on port numbers hence, it only works for TCP (Transport Control Protocol) and UDP (User Datagram Protocol) used by higher-layer protocols of the IP suite of protocols such as HTTP i.e., the World Wide Web (or simply the Web) and the Domain Name Service (DNS) protocol. Conceptually, TCP and UDP include also, on top of the IP destination and source addresses of a datagram, a protocol port number, allowing the sender to distinguish among multiple applications programs on the remote machine. Because there are “well-known port numbers”, a list of which can be found in RFC 1700 (i.e., a Request For Comment of the Internet Engineering Task Force or IETF) and “privilege ports” (i.e. port numbers below 1023), a router acting as the gateway of a sub-network connected on the Internet can be programmed to intercept e.g. all HTTP requests on port 80, the port number for the applications using this IP protocol. Then, all ETTP requests may be indeed forwarded transparently to a proxy server as requested without having to personalize client browsers. A discussion on this and more can be found in a publication by the International Technical Support Organization of IBM Corporation, P.O. Box 12195, Research Triangle Park, N.C. 27709 U.S.A. under the title ‘Web Caching and Filtering with IBM Websphere Performance Pack’, dated March 1999.
Although the above scheme works and is widely used it can become the source of many problems. If a service normally uses a well-known port, that does not mean that it cannot use another port. This must be considered because it might be used to circumvent the gateways restrictions either by an outsider or an insider if, as it is often the case, on top of being just a caching proxy it implements logging, filtering and security functions. Often, weaknesses are not directly created by outsiders, but by insiders who consider the gateway to be unnecessarily restrictive. An insider that wants to provide an outside access that is not permitted may use a nonstandard port in order to do it. For example, if one prevents users from providing HTTP servers but allow connections from outside to non privileged ports (i.e. equal or greater than 1023), a user can provide HTTP access using a port other than 80 thus escaping the transparent proxy server and its logging, filtering and security functions. Also an outside privileged port might be used by an outsider to circumvent the gateway. If, for example, it is allowed from outside to access from TCP port 20 (a port usually used by a File Transfer Protocol or FTP server for data transfer), an outsider may use this port in order to run another service, for example, a Telnet client. Because Telnet is the protocol used to emulate terminal sessions from within the network, like insiders, this may have devastating consequences. Transparent proxying is further illustrated as prior art in FIG. 1.
Another popular approach to implement network gateways uses a proxy server running a networking proxy protocol referred to as SOCKS. This technique enables hosts on one side of the proxy server (e.g., clients) to gain full access to hosts (e.g., servers) on the other side of the proxy server without requiring direct IP reachability. However, SOCKS not only require that protocol be run in the proxy server itself it also assumes that each client is personalized i.e., ‘socksified’ so as to become able to interact with the proxy server. SOCKS, from which is derived the present invention, is further discussed as prior art in FIG. 2. Thus, it is an object of the invention to overcome the shortcomings, as noted above, of the prior art yet retaining all the advantages of using a transparent proxy function which does not require that end-user or client be personalized whatsoever. It is a further object of the invention not to bind the transparency of a proxy function to the examination of the TCP port from which a service is usually carried out. Further advantages of the present invention will become apparent to the ones skilled in the art upon examination of the drawings and detailed description. It is intended that any additional advantages be incorporated herein.