The present invention generally relates to computer programs. The invention relates more specifically to network devices that receive and respond to requests for electronic documents, and relates particularly to a network interface driver that intercepts, receives, examines, processes, and passes network traffic to various other network-attached devices.
1. Client-Server Network System With Proxy Server
FIG. 1 is a simplified block diagram of a client-server network system in which an embodiment may be used. Client 100 is a personal computer, workstation, smart-phone, personal digital assistant, interactive television, or other network agent or node that may have the structure illustrated in FIG. 3, which is described below. Client 100 executes Web applications 102, such as Netscape Communicator(copyright), or Microsoft Internet Explorer(copyright). In the system of FIG. 1, there may be any number of clients 100; one client is shown only by way of example.
Client 100 is coupled by a network communication path 104 to an internetwork 106. In the preferred embodiment, the internetwork 106 is the global, packet-switched IP data network, comprised of interconnected IP-aware and/or TCP/IP-aware network devices, now commonly known as the Internet. Portions of the internetwork may be owned and operated by different organizations, who cooperate to provide global data connectivity.
Within the substructure of the internetwork 106 may reside an intercepting routing device 110, such as a router or bridge, which supports Internet packet addressing, and through which packets of information (xe2x80x9cnetwork trafficxe2x80x9d) pass on their way through communication path 108 to one or more origin servers 124 within the internetwork. The term xe2x80x9corigin serverxe2x80x9d is used herein to identify a server as an originating point of delivery for one or more electronic documents that may be of interest to client 100.
The intercepting routing device 110 is directly or indirectly attached to proxy server 114 through communication path 107. The routing device may intercept certain classes of IP and TCP/IP traffic, intended for one or many origin servers, and retarget that traffic for the proxy processing engine 116. The proxy processing engine 116 may then deliver responses to the client requests, or otherwise participate in the representation or transport of the client to origin server transaction. The proxy processing engine 116 may receive electronic documents directly from origin servers (e.g. origin server 124) or indirectly through a local cache store or dynamic content generation engine
Proxy server 114 interacts with one or more network interface cards (NICs) 122. Each network interface 122 provides a lowest-level interface of proxy server 114 to network signals arriving from network 106 via routing device 110. For example, each network interface card 122 may be an Ethernet interface card.
Each network interface card 122 is associated with a driver 120, a network protocol stack 118 (for example, TCP/IP), and a proxy processing engine 116. In FIG. 1, these elements are depicted in a logical hierarchy in which network interface card 122 is a lowest logical level and proxy processing engine 116 is at the highest logical level.
The driver 120 is a software element executed on or in close association with a network interface card 122. The driver 120 is responsible for, among other things, examining each packet of information that arrives from internetwork 106 to determine its source, destination and the type of request or other message that it contains.
Using conventional techniques, driver 120 is typically responsible for receiving intercepted traffic, making it ready for local processing, and dispatching the traffic to a local proxy processing engine 116.
For example, assume that client 100 has an IP address of xe2x80x9c100xe2x80x9d, proxy server 114 has an IP address of xe2x80x9c114xe2x80x9d, and origin server 124 has an IP address of xe2x80x9c124xe2x80x9d. Assume further that the client and the servers work with electronic documents that are requested and delivered using Hypertext Transfer Protocol (HTTP). To enable the proxy server 114 to intercept requests and deliver results from a cache, the intercepting router 110 is pre-configured to intercept requests for electronic documents, and forward these requests to proxy server 114, regardless of the actual location of an original copy of the electronic document.
Now assume that the client 100 requests a particular document. The client""s request message contains information, encoded according to IP, TCP, and HTTP, xe2x80x9cSource=100,xe2x80x9d xe2x80x9cDestination=114,xe2x80x9d xe2x80x9cDestination Port=80. xe2x80x9d The proxy server 114 knows that the requested electronic document is really located on origin server 124. In past approaches, the driver 120 would translate the destination address from xe2x80x9c114xe2x80x9d to xe2x80x9c124xe2x80x9d, translate the destination port value from xe2x80x9c80xe2x80x9d to xe2x80x9c8080xe2x80x9d, and pass the packet logically upward for processing by the proxy processing engine 116. If the proxy processing engine 116 needs to obtain a copy of the electronic document from the origin server 124, the proxy server sends an appropriate request, but the request identifies the IP address of the proxy server 114, not the client 100.
The IP stack 118 is responsible for extracting and processing Internet Protocol information embedded in packets of information that arrive from network 106.
In the preferred embodiment, proxy processing engine 116 is an instance of the TrafficServer(trademark) brand proxy server, release 1.1.6 or later, commercially available from Inktomi Corporation of San Mateo, Calif.
2. Deficiencies of Past Approaches
Simple proxy interception approaches have several drawbacks, including problems related to:
transparent handling of out of spec traffic
transparent handling of unknown protocols
semantic changes caused by the presence of transparent proxies
overload handling
fault tolerance
malicious attacks
client or server logic errors
client and server preferences not to be intercepted
unnecessary resource consumption for tunneled traffic
Each of the problems identified above shall now be described in greater detail.
With respect to handling xe2x80x9cout-of-specxe2x80x9d traffic, it has been discovered that traffic interception schemes may intercept incorrectly implemented traffic flows, or traffic flows in an older or newer revision of the protocol, that may be sufficient for some origin servers, but not for the target of the interception, such as the proxy server. For example, a network device configured to intercept TCP port 80 HTTP traffic, and send it to a proxy server, may end up sending unofficial extensions to HTTP, or incorrect HTTP. While those HTTP extensions may be supported by a special version of an origin server, they may not be supported by a general proxy server. The intercepting proxy may then yield different or erroneous results as compared to the origin server due to out-of-spec traffic.
With respect to transparent handling of unknown protocols, interception network devices commonly use heuristics to identify the composition of traffic flows. For example, historically, TCP port 80 has been generally reserved and exclusively used for HTTP network traffic. Relying on this standard, an interception network device intercepts all port 80 traffic and redirects this traffic to a local server, such as a proxy server. However, there is no ubiquitous enforcement ensuring that all traffic carried by port 80 is HTTP. For example, because some networks have firewalls that only permit the exchange of port 80 data, some users have been induced to embed non-HTTP traffic (such as networked computer game traffic) into port 80, to circumvent filtering policies. As a result, non-HTTP traffic on port 80 arrives at an interception network device, and is redirected automatically to the proxy processing engine 116. Because the proxy server likely is expecting HTTP traffic, it responds with an error condition and closes the current connection. From the client""s point of view, the end server appears to stop working correctly.
Another problem involves the obscuration of IP identity through proxies. As a result of traffic interception and the presence of the proxy server 114, when the client""s forwarded request reaches the origin server 124, the origin server typically receives the IP address of the proxy server, and not the IP address of the client 100. If the origin server uses the client IP address for access control to the documents, the request may be refused, or the response content may be generated specially for the wrong IP address, because the proxy is obscuring the true address of the client. Furthermore, the proxy cannot in general masquerade as the IP address of the client because the return path of IP traffic must be directed back through the proxy.
Still another problem is caused by the redirection target, such as proxy processing engine 116, being unexpectedly overloaded. Because intercepting proxies are central intermediaries, it is important that they not degrade the quality of service.
With respect to client or server logic errors, clients and servers may contain logic errors that do not inter-operate well with intercepting proxies.
Additionally, some clients and servers may explicitly wish to have their traffic not processed by intercepting devices.
Additionally, some types of intercepted transactions may not be relevant for processing by intercepting applications. While the redirection to an intercepting server may yield a correct result, additional resources may be expended without providing value.
A system, method and mechanism are provided that address the problems enumerated above. In particular, a system, mechanism and method are provided for dynamically determining whether to dispatch traffic to a local proxy server, or to bypass the proxy server to send the traffic to a remote server or to the original target origin server. Various embodiments are provided that can:
recognize packets that carry malformed or out-of-spec protocol traffic, and bypass them to the origin server without transfer to the proxy processing engine;
recognize packets that are presented in a foreign or unprocessable protocol, and bypass them to the origin server without transferring them to the proxy processing engine;
recognize network traffic that cause semantic changes or errors related to IP identification and proxy-based IP address changes, and bypass this traffic directly to the origin server, preserving the client IP address;
detect overloaded redirection targets, and bypass quantities of traffic directly toward origin servers, and away from interception target applications, to prevent overload;
detect known problematic clients or servers, bypass traffic directly toward origin servers, and away from interception target applications;
efficiently maintain distributed lists of clients and servers that wish not to be processed by intercepting applications, bypass this traffic directly toward origin servers, and away from interception target applications; and
identify classes of transactions that will not gain value from redirection to intercepting servers, and efficiently bypass this traffic directly toward origin servers, and away from interception target applications.
Further, because intercepting proxies are central intermediaries, and because redirection target applications can fail, a system, method and mechanism are provided that can detect non-functional redirection targets, and bypass traffic directly toward origin servers, and away from interception target applications.
Similarly, because intercepting proxies are central intermediaries, and because malicious clients may be able to construct schemes to interfere with the correct operation of these intermediaries, denying service to all users, a system, method and mechanism are provided that can detect malicious attacks, and bypass traffic directly toward origin servers, and away from interception target applications, to minimize the risk of denial of service attacks.
The foregoing needs, and other needs that will become apparent from the following description, are addressed by the systems, methods and methods that are described in this disclosure.