On the Internet, browsers or clients locate the objects that they want to access, and the servers where the objects reside, through a variety of means. Most often they do so by either using directories and search engines, or by using links embedded in other objects that the clients already possess. Regardless of how they determine which objects to access and where to access them from, when the clients access these objects their accesses can be routed through a network of one or more intermediaries (called “proxies”) between the requesting clients and the providing servers.
Usually, these proxies fulfill two broad goals that improve the overall quality of accesses over the Internet: (a) when it is possible to do so, the proxies cache frequently accessed objects so that the data can be supplied to the requesters quickly and with frugal use of physical data paths, and (b) some of these proxies can perform certain automated functions such as redirecting accesses to occur from proximal sources and enforcing certain access control policies at boundaries between organizations. Both of these functions are beneficial precisely because they can be applied to accesses that are mediated by the proxies; a non-mediating machine, or a machine that acts as a mere physical router of data generally cannot provide these benefits.
Forward proxies are used today to provide mediated points of access, where web traffic can aggregate, so that access controls can be applied on the proxy in addition to caching frequently accessed objects. Commercial offerings of forward proxy products from major vendors tend to be competitive in offering sophisticated types of filtering of these objects, anonymity at the “IP” address level (i.e., stripping out identity information such as the IP addresses), and features such as logging, auditing, and metering of accesses.
Unfortunately, because many of the semantic aspects of client-server interaction are transparent to proxies and because many proxies are themselves expected to provide semantically transparent behavior, the use of proxies as brokers in sophisticated ways to identify users and web sites to each other tends to be minimal. Such behaviors are conventionally provided either at browsers or at web sites. Browsers are extended by agents that control cookies, help with automated logins into web sites, and so on. Servers are extended by agents (e.g., NetEgrity software) that intervene to authenticate the user and then provide access controls based on user identity.
In particular, an “end-to-end state management problem” makes it difficult to perform identity-related management tasks at a forward proxy. It will be apparent after studying the present invention that value can be added to Internet accesses if a system is able to handle an end-to-end interaction between client and server as a composite of three semantically separable interactions: between client and the proxy (client-proxy session), between proxy and the server (proxy-server session), and between the client-proxy and proxy-server sessions, and proxy based services. This separation is generally defeated in conventional systems because the identity component of requests sent from a client to a server is not fully transparent to the proxy. When a browser sends a request to a web server, it sends along with it some data that is specific to the web server (or specific to the domain to which the web server belongs). This data is placed at the browser by the web server, may be altered by the web server in the course of its interactions with the browser, and may persists across sessions.
The web server may use such “cookies” data for a variety of means. Cookies are perhaps most commonly used to: know who the user is, so that the server can personalize the service offered in accordance with user's preferences; maintain and update state at the server on behalf of the user, so that the user's interactions with the server are effectively bookmarked for session-to-session continuity; and/or maintain state within a session with the user across multiple interactions.
Cookies pose a problem for identity substitution. Conventionally, proxies dutifully forward the cookies that they receive from the user on to the server (some conventional proxies block cookies, creating problems by limiting server offerings). The use of cookies can reveal the user's true identity to the server. For example consider this interaction:                (a) Client contacts a web site to receive stock quotes for some equities.        (b) The client's browser inserts a cookie into the request, which is meant for interpretation at the server.        (c) The server examines the cookie, and based upon the value of the cookie, retrieves a profile about the client that is maintained at the server.        (d) The profile tells the server which equities and what types of information should be supplied to the client.        (e) The server retrieves the requested information and sends to the client.        
In the above interaction, the cookie that is transferred in step (b) is opaque to a proxy. In this case, while one can mechanically split the end-to-end communication into the three component interactions noted above, the communication cannot be semantically separated because the proxy cannot derive or add meaning to the original interactions. Thus, cookies that are intelligible only at the server are a significant obstacle to creating intelligent identity mediation at proxies. Even when they don't reflect personally identifiable information, cookies have privacy implications. As users visit sites, it becomes possible to track them because of the cookies that they use, and to know what they are doing at those sites. Companies may use such information for commercial purposes such as tailoring insertion of advertisements. Cookies also enable tracking a user across different web sites by third party arrangements among web sites (e.g., using DoubleClick cookies). A conventional forward proxy can defeat simple types of identity inferences at the server (such as those based on the client's Internet address) by submitting to the server a request whose source is the proxy itself and not the client, but such a proxy cannot simply replace cookies that may have information other than identity incorporated into them without risking a reduction or loss of the functionality provided by the site.
Also, cookies are not the only problem. Many cookie-less interactions also involve end-to-end state maintenance that is set in motion as follows: when the client contacts a server, the server first asks the client to sign on with a valid user identifier (and almost always, a password). Once the user signs on, the server associates server-side state for the duration of a session on behalf of the signed-on client. Interactions that occur in this way are just as opaque to the proxy as the cookies were, since the proxy cannot alter the client's request without control over how the server side state affects the processing of the request.
Accordingly, it would be an advancement to provide new tools and techniques for coordinating user identity management with cookies. Such tools and techniques are described and claimed herein.