Corporate and institutional network administrators, Internet Service Providers, and others who are directly responsible for making networks run properly require tools and techniques for issuing and enforcing access policy. Appropriate policies must be embodied in servers and/or other computer processes in a computer network when that network provides access to the Internet or is otherwise concerned with security, authentication, filtering, caching, or similar issues. The software mechanisms which implement policy in the decentralized environment of a network are generally known as “middleware”. In particular, access policy enforcement may call for some mechanism to uniquely identify users who access information through a proxy server.
Many Internet protocols are concerned with providing a service for brokering transport and application transactions between a client and a server. Only recently have Internet Engineering Task Force (“IETF”) initiatives begun to address issues that arise in connection with brokering the relationships between middleware and a client, middleware and other middleware, and middleware and a server. For example, IETF Request For Comment 2109 (“RFC 2109”) defines a mechanism for maintaining state between the endpoints of a transaction, but it says little about maintaining state when a proxy intermediates communication between those endpoints. Other sources have also shed little light on the problem of maintaining state information and performing middleware tasks such as authentication when a proxy is involved in the communication.
Some systems attempt to identify users by noting the IP address or other network address being used. But this does not distinguish between individual users who share a machine. For instance, in a multi-user UNIX Operating System, all of the users use the same IP address of the machine. Another example would be machine in a NAT (Network Address Translation) configuration where all of the machines in the NAT use the same IP address when access information on the Internet.
An HTTP Working Group memo authored by David M. Kristol entitled “HTTP Proxy State Management Mechanism” recognizes that it can be useful for an HTTP proxy server and its client (which could be a user agent or another proxy server) to share state. The memo describes two headers, Pcookie and Set-Pcookie, which carry state information between participating HTTP proxy servers and their clients. However, the Pcookie approach apparently fails with transparent proxy servers, that is, proxies that have not been directly identified as such to the client.
To better understand the benefits and limitations of current approaches and how they might be modified or replaced to better support middleware, we begin with a look at Hypertext Transfer Protocol (“HTTP”). HTTP is a stateless protocol, but it has been adapted to keep state information in data structures known as “cookies”. Cookies are a mechanism which HTTP servers can use to both store and retrieve information on the client side of the connection. Cookies may be implemented with small data files that are kept in volatile or persistent memory on a client machine; for example, they may be written to a client's disk drive by a web server. These data files contain information the web server can use to track such things as login identifiers, lists of identifiers of pages downloaded by a client, or the date when a particular page was last sent to the client by the server.
FIG. 1 illustrates conventional tools and techniques for managing cookies. A user agent 100, such as a web browser, uses HTTP to seek access to resources such as web pages or database content through an origin server 102. The origin server is sometimes called a “target server” or “host” or “resource manager”; the state information is sometimes called the “context”. Cookies and HTTP generally are well known in the art. In particular, RFC 2109 discusses cookies and their management.
During a step 104 the user agent 100 makes a request directed to the origin server 102. For instance, the request may be an HTTP request for a web page. Assume that the origin server 102 is configured to attempt to maintain state information about the particular user agent 100, and that the server 102 tries to maintain that state information at least in part on the user agent 100 instead of storing it all on the server 102. During a step 106 the server 102 sends the user agent 100 a response with some or all of the state information the user agent 100 should store. In HTTP, this is done using cookies; a cookie is introduced to the client by including a Set-Cookie header as part of an HTTP response.
Assume that the user agent 100 is configured to accept the cookie, and stores it. During a step 108 the user agent 100 again sends the origin server 102 its request, but this time the cookie is included in the HTTP header, indicating the user agent's acceptance of the cookie. During a step 110 further interaction may then take place between the user agent 100 and the origin server 102.
This further interaction may include numerous requests and responses. In particular, as shown in RFC 2109 and otherwise well known, step 110 may include additional responses with cookies from the origin server 102 (The accompanying diagrams note the cookies in the responses with the words “set cookie”.) These cookies result in corresponding requests from the user agent 100, with the corresponding cookies in the headers of the corresponding responses. The step 110 may also include transmittal of web pages, applets, or other data from the server 102, and text or button clicks or other POST operations by the user agent 100.
FIG. 2 illustrates conventional tools and techniques for using a known proxy server 200 to filter out cookies. The proxy server 200 is known to the user agent 100 in the sense that the IP address, domain name, or other identifying information about the proxy server 200 has been stored on the user agent 100 to configure the user agent 100 so that the user agent 100 expressly directs a request 202 to the proxy server 200 for origin server 102 resources. For instance, commonly used web browsers allow one to specify a proxy server by entering the server's IP address through the application's configuration user interface.
In response to request 202, the proxy server 200 may forward a corresponding request 204 to the origin server 102. On the other hand, if the access policy enforced by the proxy server 200 denies access to the origin server 102 for the user agent 100, the request 204 will not be sent. The proxy server 200 may also be used for caching, which will reduce the response time when one or more user agents 100 request the same data previously retrieved from the origin server 102.
In addition to filtering (a.k.a. “blocking” or “screening”) outgoing requests from the user agent 100, the proxy server 200 may be configured to filter incoming cookies. This may be done to protect the privacy of the person running the user agent 100. That is, the proxy server 200 may be an “anonymizer”. Instead of forwarding the cookies to the user agent 100, the proxy server 200 saves the cookies sent in 206, associates them with the user agent in 202, and generates and provides a request(s) 208 to the origin server 102 without contacting the user agent 100 again. The request identifies (through IP addresses, for instance) the proxy server 200 as its source, instead of identifying the user agent 100. The request 208 may or may not include cookies included in previous responses from the origin server. It may include a dummy cookie in its header. The dummy cookie meets the minimal syntactic and semantic requirements for a regular cookie, but it has no information that the origin server 102 can use to relate the data transmitted in 202. This technique preserves the anonymity of the user and user agent 100.
During steps 210 and 212, there may be further interaction between the origin server 102 and the proxy server 200, and between the proxy server 200 and the user agent 100. This further interaction may come soon after the previous steps, or it may occur over a short or long period of time afterward, with or without substantial intervening periods of inactivity. To the extent that this further interaction involves cookies, the proxy server 200 may block the cookies to prevent them from reaching the user agent 100, as just described.
FIG. 3 illustrates conventional tools and techniques for using a known proxy server 300 which supports cookies. The initial steps are those shown in FIG. 2; a request 302 sent to the proxy server 300 results in a corresponding request 304 to the origin server 102.
But instead of blocking incoming cookies 306, the proxy server 300 forwards 308 them to the user agent 100. The user agent 100 accepts the cookies, and later will provide a response in the form of a request 310 containing the cookie in a header. The proxy server 300 forwards 312 this request—with the cookie—to the origin server 102. Although the request 312 identifies the proxy server 300 as the source IP address, the cookie in the request 312 header identifies the user agent 100 and may contain private user information, either explicitly represented or encoded in a way known only to the origin server 102.
During steps 314 and 316, there may be further interaction between the origin server 102 and the proxy server 300, and between the proxy server 300 and the user agent 100. To the extent that this further interaction involves further cookies from the origin server 102, the proxy server 300 forwards those requests to the user agent 100. That is, the proxy server 300 does not block cookies. Likewise, the proxy server 300 forwards cookies from the user agent 100 to the origin server 102; the proxy server 300 does not substitute dummy cookies. Of course, the proxy server 300 can still perform caching, filtering according to IP addresses, and other conventional proxy server functions.
FIG. 4 illustrates conventional use of a transparent proxy 400. Unlike the known proxy servers 200 and 300, the transparent proxy server 400 is unknown to the user agent 100. That is, the user agent 100 itself has not been configured to communicate with the transparent proxy 400. Instead, the transparent proxy 400 is inserted in the communication path by means of capturing network traffic at a router or gateway with access to all traffic transmitted between the user agent 100 and the origin server 102. This capture is effected without any modification to the user agent 100. The transparent proxy 400 may be inserted to perform caching or to enforce access control policy on user agent requests.
Like the known proxy server 200 shown in FIG. 2, the transparent proxy 400 can filter out cookies in responses from origin servers 102. The transparent proxy can also modify or replace cookies sent in HTTP messages from the user agent 100.
In neither of the two cases described in FIG. 3 or FIG. 4 can the proxy 300 or 400 assure that every request or other HTTP message sent from the user agent 100 contains a cookie with information known to the proxy 300 or 400. Thus, if the proxy 300 or 400 needs to associate persistent information with the user agent 100 (information such as identity or other attributes), it cannot do so using cookies with the current state of the art. This puts proxies at a disadvantage with respect to origin servers, because the origin servers can use cookies for storing persistent user agent information and for having that information forwarded by user agents as part of requests as in 302. User agents only include cookies to requests directed at origin servers that have previously included cookies in replies. In contrast, because proxies are normally not the target of user agent requests, user agents will not attach cookies that are specific to proxies to the requests. Thus, conventional proxies are unable to adequately provide information service enhancements based on user identities or other persistent attributes.
Thus, it would be an advance in the art to provide better tools and techniques for brokering contextual relationships between and among clients, middleware, and origin servers, so that relevant information is propagated among and between platforms in an appropriate manner.
It would be a further advancement to provide such improvements without greatly intruding on the transaction endpoints (clients and origin servers), that is, without substantially reducing performance, without requiring changes to HTTP, and without requiring additional software on clients or origin servers.
Such improvements are disclosed and claimed herein.