When web pages (HTML documents) on the Internet are displayed with a web browser, recently obtained web pages are cached in order to improve response times for requests for these web pages.
An HTTP GET request, which is used to obtain and display a web page on a web browser, has its unique returned HTML document (response) specified by a URL (Uniform Resource Locator). Therefore, the URL can be used as a key to store (cache) the HTML document on the web browser (client machine) or a proxy (server machine), and the stored HTML document can be returned for a request that specifies the same URL without need of accessing a server application. Thus, the response time can be reduced.
If the HTML document is to be cached on the web browser, the web page that has been previously accessed is stored on a local disk of the client machine. Then, if the same URL (web page) is to be accessed, a request is transmitted with a timestamp of the stored web page added to the If-Modified-Since header of HTTP. If the server supports caching by the web browser (client), the server checks if the requested web page has been updated since the time of the timestamp added to the received request. If it has not been updated, a response with an empty body and a status code 304 (Not Modified) is returned. The web browser then displays the web page stored on the local disk. On the other hand, if the web page has been updated since the time of the timestamp, the server returns a usual response, i.e., the HTML document of the requested web page.
If the HTML document is to be cached in the proxy, the URL is used as a key to store the HTML document in memory of the server machine. For a request that specifies the same URL, the proxy reads the stored HTML document from the memory and returns the HTML document without accessing the server application. In this case, the cache is supposed to keep its content up-to-date.
Today, network services that use XML documents as inputs and outputs, such as web services, are becoming popular. However, in providing such a service, a response may not be uniquely specified by a URL (or a URI (Uniform Resource Identifier)). This is because in web services, the content of an XML document (service) to be returned often depends on the content of an XML document included in a request.
Thus, in such a system that uses XML documents as inputs and outputs, using URLs as keys to cache XML documents is difficult.
A possible way is to use a literal expression of an XML document included in a request as a key to cache a response XML document. However, there may be many XML documents with different expressions of strings and therefore different literal expressions for a key, even though the XML documents are the same in their meaning for the server application. Therefore, an efficient cache hit cannot be provided.