Of the many uses of the Internet, one of the more common ones is to access content on a remote server, such as a World Wide Web server. Typically, a person operates a client device to access content on a remote origin server over the Internet. The client may be, for example, a personal computer (PC) or a handheld device such as a personal digital assistant (PDA) or cellular telephone. The client often includes a software application known as a browser, which can provide this functionality. A person using the client typically operates the browser to locate and select content stored on the origin server, such as a web page or a multimedia file. In response to this user input, the browser sends a request for the content over the Internet to the origin server on which the content resides. In response, the origin server returns a response containing the requested content to the client, which outputs the content in the appropriate manner (e.g., it displays the web page or plays the audio file). The request and response may be communicated using well-known protocols, such as transmission control protocol/Internet protocol (TCP/IP) and hypertext transfer protocol (HTTP).
For a variety of reasons, it may be desirable to place a device known as a proxy logically between the client and the origin server. For example, organizations often use a proxy to provide a barrier between clients on their local area networks (LANs) and external sites on the Internet by presenting only a single network address to the external sites for all clients. A proxy normally forwards requests it receives from clients to the applicable origin server and forwards responses it receives from origin servers to the appropriate client. A proxy may provide authentication, authorization and/or accounting (AAA) operations to allow the organization to control and monitor clients' access to content. A proxy may also act as (or facilitate the use of) a firewall to prevent unauthorized access to clients by parties outside the LAN. Proxies are often used in this manner by corporations when, for example, a corporation wishes to control and restrict access by its employees to content on the Internet and to restrict access by outsiders to its internal corporate network.
It is also common for a proxy to operate as a cache of content that resides on origin servers; such a device may be referred to as a “proxy cache”. An example of such a device is the NetCache product designed and manufactured by Network Appliance, Inc. of Sunnyvale, Calif. The main purpose of caching content is to reduce the latency associated with servicing content requests. By caching certain content locally, the proxy cache avoids the necessity of having to forward every content request over the network to the corresponding origin server and having to wait for a response. Instead, if the proxy cache receives a request for content which it has cached, it simply provides the requested content to the requesting client (subject to any required authentication and/or authorization) without involving the origin server.
Proxy caches may also be used to facilitate transformations of the requested content prior to returning the requested content to the requesting client. Examples of such transformations include translation of web pages retrieved from the origin server to different formats depending on the client device type (e.g., a PDA, a cellular telephone, etc.), translation of web pages to different human languages, insertion of advertisements into web pages, checking web pages for viruses, etc. The transformations may be performed by designated applications residing on a proxy cache or independent servers. The requested content may undergo multiple transformations (e.g., virus check, language translation, and advertisement insertion), resulting in multiple versions of the requested content. Storing multiple versions complicates caching algorithms and introduces disk as well as memory overheads. However, if a web page retrieved from the origin server is cached, but all of its transformed versions are discarded, the web page has to be routed through the different services each time a similar request for content is received, consuming additional bandwidth and increasing the latency associated with servicing content requests.