World Wide Web (Web) documents are commonly written in HTML (Hypertext Mark-up Language). HTML documents typically reside on Web servers and are requested by Web clients. Often delays can be introduced during Web browsing by heavy communications traffic on the Internet or slow response of a remote site, for example. Providing one or more servers for mirroring Web sites located on remote servers is one means of reducing delays involved with browsing the Web. These mirroring servers, typically referred to collectively as a "proxy" or individually as "proxy servers," store frequently accessed Web sites in a local cache, thereby eliminating recurrent retrievals of commonly accessed documents. Thus, when a request for a particular Web page is received from a client, the proxy server associated with the particular client looks first to its local cache to service the request rather than the remote site upon which the Web page resides. If the requested document is found locally, the request can be serviced by the proxy server and a subsequent request to the remote server for the document can be avoided. Therefore, only when a valid copy of the requested document is not in the proxy's local cache would the remote server need to be accessed. In this manner, exposure to heavy communications traffic on the Internet and slow response of remote serves can be reduced.
While this mirroring approach is beneficial to end-users, it makes hit tracking for remote site administrators difficult. A hit is a request for a Web page, typically initiated by a user selecting a hypertext link for the Web page. The mirroring approach discussed above disrupts a remote server's ability to track the total number of requests for a given Web page because, as discussed above, some of the requests are intercepted and serviced by proxy servers. It is desirable to have an accurate count of requests for a given Web page or group of pages to track the relative popularity of a page, for example or to provide feedback to advertisers whose advertisements appear on the page. Therefore, what is needed is a mechanism for tracking user hits by the proxy and a mechanism for notifying mirrored sites, thereby allowing remote site administrators to accurately track total hits (i.e., those requests serviced from a proxy's local cache and the requests serviced by the remote server).
Another problem with the current mirroring approach is the inefficient allocation of the proxy's cache space. Currently, each client is assigned to one or more proxy servers. Therefore, the documents most recently requested by each active client will reside in the corresponding proxy server's cache. Assuming one or more clients assigned to different proxy servers have requested the same document recently, the same document might be cached in several of the proxy servers, thereby reducing the cache storage space for other frequently requested documents. Further, one or more extremely popular documents might potentially be cached in each proxy server. While redundancy of information is useful for fault tolerance, organized redundancy would be preferable. Given the foregoing, what is needed is a means of more efficiently allocating cache space within a proxy. Specifically, it would be desirable to allocate mutually exclusive portions of the Web's content to particular proxy servers.