1. Technical Field
The present invention relates to computer networks, and more particularly to a method and system, in an Internet Protocol (IP) network, of enforcing the dispatching of Internet Protocol (IP) datagrams on a plurality of servers according to a defined policy.
2. Description of the Related Art
Internet
The Internet is a global network of computers and computer networks (the xe2x80x98Netxe2x80x99). The Internet connects computers that use a variety of different operating systems or languages, including UNIX, DOS, Windows, Macintosh, and others. To facilitate and allow the communication among these various systems and languages, the Internet uses a language referred to as TCP/IP (xe2x80x98Transmission Control Protocol/Internet Protocolxe2x80x99). TCP/IP protocol supports three basic applications on the Internet:
transmitting and receiving electronic mail,
logging into remote computers (the xe2x80x98Telnetxe2x80x99), and transferring files and programs from one computer to another (xe2x80x98FTPxe2x80x99 or xe2x80x98File Transfer Protocolxe2x80x99).
TCP/IP
The TCP/IP protocol suite is named for two of the most important protocols, a Transmission Control Protocol (TCP), and an Internet Protocol (IP).
Another name for it is the Internet Protocol Suite. The more common term TCP/IP is used to refer to the entire protocol suite. The first design goal of TCP/IP is to build an interconnection of networks that provide universal communication services: an internetwork, or Internet. Each physical network has its own technology dependent communication interface, in the form of a programming interface that provides basic communication functions running between the physical network and the user applications. The architecture of the physical networks is hidden from the user. The second goal of TCP/IP is to interconnect different physical networks to form what appears to the user to be one large network.
TCP is a transport layer protocol providing end to end data transfer. It is responsible for providing a reliable exchange of information between 2 computer systems. Multiple applications can be supported simultaneously over one TCP connection between two computer systems.
IP is an internetwork layer protocol hiding the physical network architecture below it. Part of communicating messages between computers is a routing function that ensures that messages will be correctly directed within the network to be delivered to their destination. IP provides this routing function. An IP message is called an IP Datagram.
Application-level protocols are used on top of TCP/IP to transfer user and application data from one origin computer system to one destination computer system. Such application level protocols are, for instance, File Transfer Protocol (FTP), Telnet, Gopher, and Hyper Text Transfer Protocol (HTTP).
IP Router
A xe2x80x98Routerxe2x80x99 is a computer that interconnects two networks and forwards messages from one network to the other. Routers are able to select the best transmission path between networks. The basic routing function is implemented in the IP layer of the TCP/IP protocol stack, so any host (or computer) or workstation running TCP/IP over more than one interface could, in theory, forward messages between networks. Because IP implements the basic routing functions, the term xe2x80x98IP Routerxe2x80x99 is often used. However, dedicated network hardware devices called xe2x80x98Routersxe2x80x99 can provide more sophisticated routing functions than the minimum functions implemented in IP.
World Wide Web
With the increasing size and complexity of the Internet, tools have been developed to help find information on the network, often called navigators or navigation systems. Navigation systems that have been developed include standards such as Archie, Gopher and WAIS. The World Wide Web (xe2x80x98WWWxe2x80x99 or xe2x80x98the Webxe2x80x99) is a recent and superior navigation system. The Web is
an Internet-based navigation system,
an information distribution and management system for the Internet, and
a dynamic format for communicating on the Web.
The Web seamlessly, for the user, integrates multiple information formats, including still images, text, audio and video. A user on the Web using a graphical user interface (xe2x80x98GUIxe2x80x99, pronounced xe2x80x98gooeyxe2x80x99) may transparently communicate with different host computers on the system, different system applications (including FTP and Telnet), and different information formats for files and documents including, for example, text, sound, and graphics.
Hypermedia
The Web uses hypertext and hypermedia. Hypertext is a subset of hypermedia and refers to computer-based xe2x80x98documentsxe2x80x99 in which readers move from one place to another in a document, or to another document, in a non-linear manner. To do this, the Web uses a client-server architecture. The Web servers enable the user to access hypertext and hypermedia information through the Web and the user""s computer. (The user""s computer is referred to as a client computer of the Web Server computers.) The clients send requests to the Web servers, which react, search, and respond. The Web allows client application software to request and receive hypermedia documents (including formatted text, audio, video, and graphics) with hypertext link capabilities to other hypermedia documents, from a Web file server.
The Web, then, can be viewed as a collection of document files residing on Web host computers that are interconnected by hyperlinks using networking protocols, forming a virtual xe2x80x98Webxe2x80x99 that spans the Internet.
Uniform Resource Locators
A resource of the Internet is unambiguously identified by an Uniform Resource Locator (URL), which is a pointer to a particular resource at a particular location. A URL specifies the protocol used to access a server (e.g. HTTP, FTP, . . . ), the name of the server, and the location of a file on that server.
Hyper Text Transfer Protocol
Each Web page that appears on client monitors of the Web may appear as a complex document that integrates, for example, text, images, sounds, and animation. Each such page may also contain hyperlinks to other Web documents so that a user at a client computer using a mouse may click on icons and may activate hyperlink jumps to a new page (which is a graphical representation of another document file) on the same or a different Web server.
A Web server is a software program on a Web host computer that answers requests from Web clients, typically over the Internet. All Web servers use a language or protocol to communicate with Web clients which is called Hyper Text Transfer Protocol (xe2x80x98HTTPxe2x80x99). All types of data can be exchanged among Web servers and clients using this protocol, including Hyper Text Markup Language (xe2x80x98HTMLxe2x80x99), graphics, sound, and video. HTML describes the layout, contents, and hyperlinks of the documents and pages. Web clients when browsing:
convert user specified commands into HTTP GET requests, connect to the appropriate Web server to get information, and
wait for a response. The response from the server can be the requested document or an error message.
After the document or an error message is returned, the connection between the Web client and the Web server is closed.
The first version of HTTP is a stateless protocol. That is, with HTTP there is no continuous connection between each client and each server. The Web client using HTTP receives a response as HTML data or other data. This description applies to version 1.0 of HTTP protocol, while the new version 1.1 breaks this barrier of stateless protocol by keeping the connection between the server and client alive under certain conditions.
Browser
After receipt, the Web client formats and presents the data or activates an ancillary application such a sound player to present the data. To do this, the server or the client determines the various types of data received. The Web Client is also referred to as the Web Browser, since it in fact browses documents retrieved from the Web Server.
Domain Names
The host or computers"" names (like www.entreprise.com) are translated into numeric Internet addresses (like 194.56.78.3), and vice versa, by using a method called DNS (xe2x80x98Domain Name Servicexe2x80x99). DNS is supported by network-resident servers, also known as domain name servers or DNS servers.
Intranet
Some companies use the same mechanism as the Web to communicate inside their own corporation. In this case, this mechanism is called xe2x80x98Intranetxe2x80x99. These companies use the same networking/transport protocols and locally based Web servers to provide access to vast amount of corporate information in a cohesive fashion. As this data may be private to the corporation, and because the members of the company still need to have access to public Web information, they protect the access to their network by using special equipment called a Firewall. A Firewall is used to prevent people not belonging to the company from accessing the private Intranet from the public Internet.
Firewall
A Firewall protects one or more computers with Internet connections from access by external computers connected to the Internet. A Firewall is a network configuration, usually created by hardware and software, that forms a boundary between networked computers within the Firewall and those outside the Firewall. The computers within the Firewall form a secure sub-network with internal access capabilities and shared resources not available from outside computers.
Often, access to both internal and external computers is controlled by a single machine, said machine comprising the Firewall. Since the computer, on which the Firewall resides, directly interacts with the Internet, strict security measures against unwanted access from external computers are required.
A Firewall is commonly used to protect information such as electronic mail and data files within a physical building or organization site. A Firewall reduces the risk of intrusion by unauthorized people from the Internet. The same security measures can limit or require special software for people inside the Firewall who wish to access information on the outside. A Firewall can be configured using xe2x80x98Proxiesxe2x80x99 or xe2x80x98Socksxe2x80x99 to control the access to information from each side of the Firewall.
Proxy Server
An HTTP Proxy is a special server that allows access to the Internet. It typically runs in conjunction with Firewall software. The Proxy Server:
waits for a request (for example a HTTP request) from inside the Firewall,
forwards the request to the remote server outside the Firewall,
reads the response, and
sends the response back to the client.
A single computer can run multiple servers, each server connection identified with a port number. A Proxy Server, like an HTTP Server or a FTP Server, occupies a port. Typically, a connection uses standardized port numbers for each protocol (for example, HTTP=80 and FTP=21). That is why an end user has to select a specific port number for each defined Proxy Server. Web Browsers usually let the end user set the host name and port number of the Proxy Servers in a customizable panel. Protocols such as HTTP, FTP, Gopher, WAIS, and Security can have and usually have designated Proxies. Proxies are generally preferred over Socks for their ability to perform caching, high-level logging, and access control, because they provide a specific connection for each network service protocol.
HTTP Caching
HTTP is an application level protocol used by the TCP connections between Web Browsers and HTTP Proxy Servers. Consequently, IP Datagrams exchanged between the Web Browsers and HTTP Proxy Servers comprises HTTP data. Since HTTP Proxy Servers terminate and manage the HTTP connections, they see and handle the HTTP data comprised in the IP Datagrams and they can store a local copy of HTTP data in an internal cache.
When an HTTP Proxy Server receives from a source system (a Web Browser) a request to retrieve HTTP data (a Web page) located on a destination system (a Web server), two situations can occur depending on whether the requested HTTP data is already stored in the local cache, or not. If the requested HTTP data is already located in the local cache, the HTTP Proxy Server immediately sends a response to the source system with the data stored in the cache. If the requested HTTP data is not located in the local cache, the HTTP Proxy Server forwards the request to the destination Web system (the Web server). When the HTTP Proxy Server receives from this destination Web system (the Web Server) the response comprising the HTTP data (the Web page), it caches said HTTP data (the Web page) in its local cache, and forwards the response to the source system (the Web Browser).
When HTTP data are already located within the cache, the request does not need to be forwarded by the HTTP Proxy Server to the destination Web system. A response is immediately returned by the HTTP Proxy server.
The HTTP Caching provides several advantages:
The response time of the HTTP service is improved. The HTTP Proxy Server immediately answers the request to retrieve HTTP data when said HTTP data is already stored in the cache of the HTTP Proxy Server.
The utililization of network resources is optimized. No traffic is required between the HTTP Proxy server and the destination Web system for requested HTTP data already stored in the cache.
Socks and Socks Server
Socks is a protocol which does some form of encapsulation of application level protocols (for instance FTP, Telnet, Gopher, HTTP). Using Socks, the application level traffic between a system running a Socks Client software and a system running a Socks Server software is encapsulated in a virtual Socks tunnel between both systems. Socks is mainly used by systems within an Intranet in order to gain a secure access to systems located outside the Intranet.
A Socks Server acts as a relay between the systems within the Intranet and the systems outside the Intranet, thus hiding the internal systems from the external Internet. It is considered as one form of Firewall. A Socks Server (also called Socks Gateway) is software that allows computers inside a Firewall to gain access to the Internet. A Socks Server is usually installed on a server positioned either inside or on the Firewall. Computers within the Firewall access the Socks Server as Socks Clients to reach the Internet. Web Browsers usually let the end user set the host name and port number of the Socks Servers in a customizable panel. On some Operating Systems, the Socks Server is specified in a separate file (e.g. socks.conf file). As the Socks Server acts a layer underneath the protocols (HTTP, FTP, . . . ), it cannot cache data (as Proxy does), because it doesn""t decode the protocol to know what kind of data it transfers.
Options
The Web Browser often proposes the end user to select between the different options xe2x80x98No Proxiesxe2x80x99, xe2x80x98Manual Proxy Configurationxe2x80x99, or xe2x80x98Automatic Proxy Configurationxe2x80x99 to designate the connection between his computer and the Internet.
Users with a direct connection to the Internet should use the default option, which is xe2x80x98No Proxiesxe2x80x99.
If the Intranet is protected by one or several Firewalls, the end user may:
select one of these Firewalls as the elected Proxy, by entering its host name into the xe2x80x98Manual Proxy Configurationxe2x80x99, or
automatically refer to the enterprise policy in terms of Proxies attribution between locations, by pointing to a common configuration file in a remote server. This is done by choosing the xe2x80x98Automatic Proxy Configurationxe2x80x99 and by providing the Web Browser with the unique address of the common configuration file (xe2x80x98Universal Resource Locatorxe2x80x99 or xe2x80x98URLxe2x80x99) located in the remote server.
Today, most of the Web Browsers are configured to forward all requestsxe2x80x94even requests for internal hostsxe2x80x94through the Socks Firewall. So when an end user wants to access an internal Web-based application, his request travels to the Firewall, and is then reflected back into the internal network. This mechanism generates internal traffic over a long path, puts extra load on the Firewall and on the network, and worst of all, slows down the response time the end user sees from the applications and Web pages he tries to access. This is called xe2x80x98non-flexiblexe2x80x99 Socks access (when everything goes via the Socks Server).
Manual Proxy Configuration
The Manual Proxy configuration in the Web Browser is simple to process. However, the main drawback is that the Firewall (or Proxy) selection is static. There is no dynamic criterion for selecting the Firewall, such as selecting the Firewall according to the response time. Firewall failures require a manual reconfiguration of the navigation software to point to another active Firewall, since the manual configuration usually only allows the definition of one single Firewall per protocol with no possibility to pre-configure a backup Firewall. In addition to the manual proxy configuration in the Web Browser, external procedures can be used to provide some kind of robustness in the Firewall selection. They rely, for instance, on the use of multiple Firewalls having the same name defined as aliases in the Domain Name Server (DNS). But this technique based on alias definition still has drawbacks since, for instance, the DNS is not always contacted for name resolution (association between name and IP address) by Web Clients when said Web Clients locally cache the name resolution. Other techniques using external hardware equipment such as load and request dispatcher provide more robustness and load balancing, but still have drawbacks such as the need for additional and costly hardware.
Automatic Proxy Configuration
Automatic Proxy Configuration (or also referred to as xe2x80x98autoproxyxe2x80x99) can set the location of the HTTP, FTP, and Gopher Proxy every time the Web Browser is started. An autoproxy retrieves a file of address ranges and instructs the Web Browser to either directly access internal IBM hosts or to go to the Socks Server to access hosts on the Internet.
Automatic Proxy Configuration is more desirable than simple Proxy Server Configuration in the Web Browser, because much more sophisticated rules can be implemented about the way Web pages are retrieved (directly or indirectly). Automatic Proxy Configuration is useful to users, because the Web Browser knows how to retrieve pages directly if the Proxy Server fails. Also Proxy requests can be directed to another or multiple Proxy Servers at the discretion of the system administrator, without the end user having to make any additional changes to his Web Browser configuration. In general, these Proxy configuration files (also called xe2x80x98autoproxy codexe2x80x99) are usually written in Javascript language. Autoproxy facility can also contain a file of address ranges for instructing the Web Browser to either directly access internal hosts or to go to the Socks Server to access hosts on the Internet. The Socks Server protects the internal network from unwanted public access while permitting access of network members to the Internet. One of the drawbacks of this xe2x80x98autoproxyxe2x80x99 mechanism is that there is no proactive Firewall failure detection nor response time consideration.
More explanations about the technical field presented in the above sections can be found in the following publications, incorporated herewith by reference:
TCP/IP Tutorial and Technical Overview by Martin W. Murhammer, Orcun Atakan, Stefan Bretz, Larry R. Pugh, Kazunari Suzuki, David H. Wood, International Technical Support Organization, October 1998, GG24-3376-05.
Java Network Programming by Elliotte Rusty Harold, published by O""Reilly, February 1997.
Internet in a Nutshell by Valerie Quercia, published by O""Reilly, October 1997.
Building Internet Firewalls by Brent Chapman and Elizabeth Zwichky, published by O""Reilly, September 1995.
Problem
The problem to solve is to police the Web traffic within the Intranet. When multiple Proxy Servers are used by source devices within the Intranet (for instance workstations running Web Browser software) to get access to Web systems located within the Internet, access rules are usually defined by the Network Administator. The purpose of said access rules is to define the Proxy Server that should be used by each source device (workstation) or each group of source devices (group of workstations) within the Intranet, to get access to Web systems located within the Intranet. For instance, source devices located in France should use a Proxy Server located in France, while source devices located in Germany should use a Proxy Server located in Germany.
Said access rules may be different according to the application level protocol (ALP). ALP traffic refers to IP Datagrams comprising data using said ALP (for instance, HTTP traffic refers to all IP Datagrams comprising HTTP data). For instance, the access rules may stipulate that source devices located in Belgium should access one specific Proxy Server located in France for HTTP traffic, and should access another specific Proxy Server located in Belgium for FTP traffic. These access rules define a policy for accessing the Web from the Intranet, and are therefore called xe2x80x98Web access policyxe2x80x99 or xe2x80x98Web traffic policyxe2x80x99. The main goals of said Web traffic policy are to:
Optimize the network resources within the Intranet. For instance, the specifications and therefore the cost of a Proxy Server depend on the number of source devices which will have access to it. A Proxy Server which will be accessed by 500 source devices will be smaller and therefore cheaper than a Proxy Server which will be accessed by 10000 source devices.
Improve the performances of the Web access service (access from source devices to Web systems located within the Intranet). For instance, a Proxy Server set-up in France is configured to provide a Web access service to a specified number of source devices in France. When more source devices (for instance source devices located in Belgium) than expected are accessing said Proxy Server, the performance of said Proxy server may be degraded and may have an impact on the Web access service.
Optimize the utilisation of network resources, in particular, minimize the bandwidth required within the Intranet for accessing Web systems. For instance, when a source device located in France wants to access a Web system through a Proxy Server, said source device should use a Proxy Server located in France instead of a Proxy server located in Japan, in order to minimize the path (and consequently to minimize the network resources utilization and the bandwidth between France and Japan) within the Intranet to reach the Proxy server.
Take advantage of Web traffic caching, since Proxy Servers usually provide HTTP and FTP caching:
The utilisation of the network resources located between the Proxy Server and the Web system is optimized. No traffic is required between the Proxy Server and the destination Web system, when HTTP data requested by a source device is already located within the cache of said Proxy Server.
The response time of the HTTP service is improved. The requests to retrieve HTTP data already located within the cache on the HTTP Proxy Server, are immediately satisfied by the HTTP Proxy Server.
It is generally admitted that an efficient Web Caching must be done as close as possible to source devices. Thus, it is important for said source devices to have access to a Proxy server located close to them.
The problem is to apply the Web access policy within the whole Intranet. For instance, when the Web access policy defines that source devices located in France should use one specific Proxy Server located in France, the problem is to make sure that said source devices actually use said specific Proxy Server and do not use instead another Proxy Server (for instance located in Japan).
The current solutions address this problem partially:
The Web Application software (for instance a Web Browser) running on the workstation can be manually configured with the target Proxy Servers. The main drawback of this solution is the following:
Proxy Server names must be known and manually configured by end users. Wrong Proxy Server names may then be entered by end users, and the Web traffic policy is then not applied. For instance, an end user located in Toulouse (France) may manually configure his Web Browser to use a Proxy Server located in Paris instead of a Proxy Server located in Toulouse.
Web Browsers can be configured with their autoproxy feature. In this case, a static list of target Proxy Servers (a Web traffic policy) is downloaded to the Web Browser from a dedicated autoproxy URL (Uniform Resource Locator) system. The main drawbacks of this solution are the following:
The end user must configure his Web Browser to use the autoproxy feature. If the end user does not configure his Web Browser correctly, the Web traffic policy is then not applied.
The autoproxy feature has to be implemented within the Intranet. For instance, an autoproxy code must be implemented on the autoproxy URL system.
An object of the present invention is to enforce the dispatching of Internet Protocol (IP) datagrams on a plurality of servers according to a defined policy.
It is another object of the present invention to optimize the performances of the Web access service, by enforcing the source devices to access the Internet though specific Proxy Servers according to a particular Web traffic policy.
It is yet another object of the present invention to optimize the utilisation of the Intranet network resources, by reducing the Web traffic within the Intranet network.
It is a further object of the present invention to simplify the configuration of the devices source of the Web traffic within the Intranet.
A method and a system, in a network device, of enforcing the dispatching of Internet Protocol (IP) datagrams on a plurality of servers according to a defined policy, each IP datagram being sent from a source port on a source device to a destination port on a destination device in an Intranet network comprising a plurality of servers and at least one client. The method comprises the steps of:
determining whether the source device of an incoming IP datagram is a client or a server;
If the source device of the IP datagram is a client:
identifying client address, client port, destination address, and destination port of the IP Datagram;
searching for a server address and a server port in a first table, this first table comprising a server address and a server port for each connection identified by a client address, a client port, a destination address and a destination port;
If a server address and a server port are identified in said first table, and if said server address and the destination address are different or if said server port and the destination port are different:
replacing the destination address and the destination port in the IP datagram respectively by the server address and the server port;
sending the IP datagram over the IP network.