Over the last several years, the use of the Internet by individuals and business entities has increased significantly as the Internet has become established as a mechanism to disseminate information. The Internet presents information to a user using a web browser that is located on the user's computer. The web browser retrieves and displays web pages from various web servers connected to the Internet. The increase in use of the Internet has caused web servers to provide large numbers of users with the same page as the users request access to the same web pages or at least pages having portions that are the same.
Initially, the web browsers, web servers, and intermediate proxy servers used data caching techniques to assist in shortening the response time when web pages are requested. These caching techniques stored a static version of a web page into data cache memory blocks at any number of locations between the end user's computer and the web server that generated a response to a web page request. Web browsers typically have a local cache to hold temporary files retrieved while the browser is used. When a new web page is requested, the browser may check to see if the requested page exists in the local cache. If the page is in the cache, the web browser may retrieve the requested page from the cache and thus eliminate the need to request information from the web server. Because the cache is typically located locally upon a hard drive of a user's computer, no Internet communication is needed. This response is typically much quicker than sending a request over the Internet. In addition, the use of a cached page eliminates a subsequent web server hit, thereby reducing the processing requirements of the web servers.
Proxy servers are servers located between two portions of a networked computing system. Typically, all of the users are attached to a network located on one side of a proxy server and the web servers providing the requested pages are located upon the other side of the proxy server. In this architecture, the user sends a request for a web page to the proxy server, which in turns sends a request for a web page to the appropriate web server. The web server responds with a web page to the proxy server. The proxy server then forwards the web page to the requesting user.
The proxy servers occasionally possess local cache memory to hold web pages that have been previously requested by users. Similarly, the proxy server may check for the storage of a requested page within its cache before sending a request to the web server. Again, if the page is found, the cached version may be used to eliminate the web server request.
The above uses of cache memory blocks have several deficiencies that diminish their effectiveness. First, these cache blocks store the complete version of a web page when it was last sent from a web server. As such, these cache memory blocks will hold only static web pages. At present, a significant amount of the web pages being requested by users include dynamically-generated content. As such, the web page requested by a first user may not contain all of the same information as the web page requested by a second user. In this circumstance, each web page will need to be considered a unique and different web page. As a result, the benefits of data caching will not be obtained even though most of the data on the different web pages may be identical.
U.S. patent application Ser. No. 09/570,071, filed on May 12, 2000, entitled, “Output Caching Module of an HTTP Pipeline and assigned to the same assignee as the present application, discloses a server caching an output page. When a dynamically-changing web page is requested by a user at a client computer and the web page is available in the web server's output cache, instead of regenerating the web page for output to a client computer, the server retrieves the web page from the output cache and sends the web page to the client computer.
A disadvantage to the solution described in U.S. patent application Ser. No. 09/570,071 is that when portions of a web page are the same, but other portions differ, such web pages will be considered to be different web pages and therefore, could not make use of output page caching.