The present invention relates to the Internet and more particularly applies to gateways and proxies used by Internet Service Providers (ISPs) and enterprise networks administrators at the boundary of their networks.
The Internet is actually a worldwide IP network that links many different organizations. The Internet is not a centralized organization but a collection of different networks from various sources, governmental, educational and commercial. Internet routing is done by many Internet providers, government departments and private service companies who establish connections among themselves and build the base of the network. Organizations and individuals connected to the Internet are usually bound to one provider and so may communicate with any other connected organization and individual across the inter-provider routes that are made of expensive communications lines often referred to as xe2x80x98peer linesxe2x80x99.
To cope with the explosion of the Internet over the past years, a rapid expansion in bandwidth and other resources deployed by the ISPs was required. Then, to contain their operational costs, ISPs and administrators of enterprise networks have largely used proxy caching which can significantly reduce bandwidth costs by retaining, locally, highly used information rather than accessing it from a remote content-server, through an expensive link (i.e., long-distance and sometimes transatlantic lines), each time it is requested by an end-user (ISP""s customers and users). The caching proxy function is also beneficial to the end-user who may thus enjoy good response time. The function is carried out by a proxy-server which is a Web server that takes over the responsibility of retrieving Internet data for multiple browser clients. Client requests are sent to the servers through the proxy server. Typically, European ISPs have their network built around this scheme. They have installed cache farms in Europe to better serve Web pages from the local cache farms rather than retrieving the pages from US content-server since it is observed that a very high percentage of the requested Web pages in Europe (up to 9 out 10) are hosted in servers located in the USA.
However, the use of proxy-servers does not go without posing its own set of problems. A first example of the problems created by the use of a proxy-server is when each user interface (i.e., client browsers) needs to be explicitly configured to recognize the proxy at the gateway of a network. This becomes rapidly an administration nightmare when proxy-servers are moved, or when proxy-server farms need to grow since all user interface need to be reconfigured. This also introduces a flaw in the main objective of such a deployment. Some users, sometimes many, which are well aware of the advantages and disadvantages of using a proxy-server on their way to the content-server, purposely disable the default proxy-server setup on their system by the network administrator. As a consequence the proxy server becomes less efficient since it handles only part of the traffic and the statistical benefit expected from the use of a proxy-server may be highly impaired by the numerous users attempting to bypass it.
However, solutions to this first type of problem have been brought by implementing xe2x80x9ctransparent proxyingxe2x80x9d techniques which do not require that each user be explicitly configured to recognize a proxy. A transparent proxy manages to catch anyway all outbound traffic irrespective of the end user attempts to bypass it. A discussion on this and more can be found in a publication by the International Technical Support Organization of IBM Corporation, P.O. Box 12195, Research Triangle Park, N.C. 27709 U.S.A, under the title xe2x80x98Web Caching and Filtering with IBM Websphere Performance Packxe2x80x99, dated March 1999.
A second type of problems encountered when using proxy-servers occurs on the path from the proxy-server to the content-server when the proxy-server is not able to provide the requested service e.g., because it does not have the requested Web page yet. In such a case, a proxy-server normally issues a request to the content-server with its own identification, utilizing its own IP address. However, prior to the installation of a proxy-server by an ISP or an enterprise network manager, some specialized hardware and software may have preexisted that were performing filtering and shaping function between clients and content-servers and more likely in front of the most expensive lines. This, in an attempt to regulate the traffic and prevent some users or some group of users from over-utilizing network resources (i.e., bandwidth) beyond what has been negotiated. Thus, the insertion of proxy-servers between clients and content-servers, which hides the identification of the actual users, does not permit those shaping and filtering functions to operate properly since their algorithms were essentially based on the real addresses of the users having issued the requests.
This second type of problems is solved in new proxy-servers that are capable of issuing requests to content-servers on behalf of the end users by borrowing (spoofing) their identification i.e., the proxy server uses user IP addresses instead of their own; hence, insuring that all downstream functions that were previously put in place, such as shaping and filtering, still operate as expected. Unfortunately, by doing so, this has introduces a further problem, especially for the case of these new proxy-servers which are in fact implemented as clusters of servers fed, on client side, from load balancing functions. This way of organizing servers has become very popular because of all its advantages in terms of maintainability, availability and scaleability. Much more on load balancing over a cluster of servers can be found e.g., in a xe2x80x98Redboookxe2x80x99 by IBM published by the Austin, Texas center of the International Technical Support Organization (ITSO) and untitled xe2x80x9cLoad-Balancing Internet Serversxe2x80x9d under the reference SG24-4993 on December 1997.
Therefore, in this case i.e., when the proxy-server is actually a cluster of servers (and in other similar situations where the proxy is not a single entity through which responses to all inbound traffic must return), spoofing the end user address in requests destined to remote content-servers cannot guarantee that responses will return to the particular proxy server, within the cluster of proxy severs, that originated the request. This is because, it is the end-user client address that has been used in lieu of the server address (for the reasons mentioned herein above).
This problem of the unpredictable return path to the originating individual proxy server in effect, foils the use of spoofing when the proxy-server is a cluster of servers. It would be highly desirable that both techniques (i.e., spoofing of the user address and proxy-servers implemented as a cluster of servers) be used simultaneously to implement a very effective solution for proxy-servers, a key component of all ISP and enterprise networks.
FIG. 1 illustrates prior art and discusses the problem solved by the invention when a proxy-server [100] to an ISP or enterprise network [110] is made of a cluster of individual servers comprising three servers [101], [102] and [103] in this particular example. Individual servers, within the cluster, are fed from a load balancer [120] aimed at dispatching the workload resulting from requests issued from the users (e.g., [111]) connected on the ISP/enterprise network [110]. Because this way of organizing a server is very effective and very flexible it is widely used to implement servers. Among the numerous advantages a cluster of servers fed through a load balancer offer, the possibility of upgrading server performances by adding extra individual servers any time to cope with the increase of the traffic on one hand and the redundancy provided by the multiplicity of servers and the load balancing function (which has always the freedom of dispatching workload only over those of the individual servers that are up and running at a given instant so as to allow a continuous availability to the end users) on the other hand, are key to explain the popularity gained by this way of organizing servers. However, when server is a proxy-server like [100], installed by the administrator of an ISP/enterprise network such as [110] in order to improve the response time to frequently accessed remote resources combined with the requirement of having to drastically control operational expenses (of which a significant contributor is the cost of the communications lines to access these remote resources in a remote content-server [130] over the Internet [140]) then, an individual server [103] (which is given the task of handling a request from a user [111] by the load balancer [120]) may have to access [150] a remote content-server [130] to be able to carry out user""s request. A simple example of this is when a Web page is requested for the first time. It needs to be fetched once in the source remote content-server to become available in the proxy.
This does not pose any problem as long as the individual server i.e., [103] in this example, uses its own IP address {IP_server_103} to forward the request to the content-server which will respond to it directly with the requested information through the Internet. However, there is often the requirement that proxy-server be transparent i.e., act on behalf of the user [111] as if it were the user. In which case proxy is conditioned to forward the requests through the Internet using user IP address {IP_user_111} in lieu of its own address {IP server_103_}. This technique in which a device mimics the IP address of another party is referred to as xe2x80x98spoofingxe2x80x99 in the jargon of the Internet and IP networking. Spoofing, which is often highly desirable, helps to preserve transparency to improve the behavior of the network or just to keep it functioning at its optimal level of performances. As already mentioned earlier, on its way to the content-server, through the Internet, requests issued from a user or a group of users may have to be filtered and shaped based on their origin i.e., their IP address. If origin addresses are masked by a proxy-server these functions no longer work properly. Then, spoofing does not fit at all with proxy-servers organized as a cluster of servers since the obvious consequence of spoofing in this case is that the content-server [130] no longer knows to which individual server the response to a request must be returned [160]. Although proxy [100], as the compulsory gateway of network [110], is on the path towards the end user [111] to which the response is eventually due, the response may end up in another individual server (i.e., [101] or [102] in this example) which, not being the origin of the request will discard it. Thus, although the two techniques i.e., cluster of servers for scaleability and availability and spoofing for transparency should be used in combination they are incompatible.
A method and a system for granting invisibility to the compulsory gateway of an IP network comprising a proxy-server aimed at serving users on the IP network are disclosed. It is assumed that the proxy-server includes a plurality of individual servers and an inverse load balancer. Then, when individual servers have to access resources available on remote content-servers i.e., whenever the users cannot be served straight from the proxy-server the following is performed:
Firstly, upon issuing requests towards the remote content-servers from the individual servers through the inverse load balancer and in order to access transparently, on behalf of the users, the resources therein, a cross referencing is established of the individual servers versus the users in the inverse load balancer.
Secondly, upon obtaining responses for the users from the remote content-servers in the inverse load balancer retrieving the references of the individual servers corresponding to the users. Hence, the responses from the remote content-servers are steered to the referenced individual servers which can serve the users on the IP network transparently, on behalf of said content-servers, thus, insuring both-way transparency, that is to say xe2x80x9cinvisibilityxe2x80x9d to the proxy-server.
The advantage for the network administrator of the method and system of the invention is twofold. The user applications need not to be personalized whatsoever while using a compulsory clustered proxy-server a popular solution which provides for maintainability, availability and scaleability. Simultaneously, it provides transparency towards remote content-servers which can continue receiving requests on behalf of the users as if no proxy-server was in between. Thus, guaranteeing that all software and hardware put in place on the path to the content-servers still continue to operate alike at the same level of performances.
Thus, it is a broad object of the invention to grant invisibility to a proxy-server organized as a cluster of individual servers so as it is not only transparent to the outbound traffic originated by the users but also gains transparency, in the other direction i.e., for the inbound traffic, vis a vis the remote content-servers that are solicited whenever a user request cannot be honored from the proxy-server itself. Hence, getting transparency from both ends it becomes invisible in the network. Further advantages of the present invention will become apparent to the ones skilled in the art upon examination of the drawings and detailed description. It is intended that any additional advantages be incorporated herein.