The Extensible Markup Language (XML) is a convenient standards-based universal format for representing and processing data. Information from almost any data source (databases, spreadsheets, messages, Enterprise JavaBeans, etc.) can be represented as XML data documents and exchanged with another entity. The entity that produces XML data documents is referred to herein as the “document server,” and the recipient of XML documents is referred to herein as the “client process.”
Once an XML data document is served to a client process, that process may want to cache the document. For non-limiting examples, a client process may want to cache documents in local memory, or at a proxy server. There are many reasons for caching, such as for increasing the speed of document delivery, for reducing load on the document server, and for reducing network bandwidth usage. A cached document, whether it contains static or dynamic content, typically expires at some time t after being served by the document server. In the simplest caching scheme, a client process can invalidate the cached document at its own discretion, but it should respect t as the upper bound (i.e., a document aged t+k, where k>0 is invalid). In more realistic caching schemes, a certain “staleness” or “grace period” is allowed, where k>0, wherein the expired cached document may still be served by the client process.
If a cached document was composed from several data sources, different portions of the document (i.e., XML document fragments) may expire at different times. For example, a Web portal may want its movie listings to expire every 7 days while its stock quotes expire every minute. A caching arrangement for a document that is composed from multiple data sources is described as the set S of bindings {p, C}, where p is a path starting at the document root and leading to one or more document fragments and C is the set of caching policies for all nodes found at p. A client process should refresh portions of a cached document according to S. However, communicating S to all client processes in the most general and efficient way possible poses a challenge.
A document server, or some other entity involved in the management of network content delivery and/or the management of delivered content, may want to control caching policies with respect to the content under their management. For example, a document server may want to control how long a given document or portion of the document should be cached by a client process or at a proxy server, e.g., to allow the client application to function properly. This may be because the document or portion of the document contains embedded access URLs, which may occasionally change, and the server needs to supply the new URLs with the document upon refresh. For another example, a document server that publishes near real-time information may want to control how long a given document or portion of the document should be cached at a client or proxy server to avoid proliferation of obsolete information. One such example may be a stock market (e.g., NASDAQ), which may want to control how long stock price information that it releases to the public is cached, so that client processes do not further disseminate information that is too stale.
One approach for a document server to convey caching policies to client processes is to annotate documents on a per-document basis. For example, with an ESI (“Edge Side Includes,” a simple markup language for use in identifying content fragments for dynamic assembly at the network edge) approach, the document server sends t in a special ESI header of an HTTP response, along with the requested XML data document. This approach is protocol-specific so if the document server uses different wire protocols, e.g., SOAP or RMI/IIOP, a new method to communicate documents' expiration properties must be defined. In the context of document fragments, ESI follows an approach in which, for each, separately cacheable fragment, the document server places an ESI markup tag containing C. However, with this approach, documents that are already deployed may be difficult or impossible to retrofit with ESI, and updating code to generate new documents with ESI tags is also difficult and laborious.
Generally, the document server could include caching information in the body of a document or in the HTTP response header. However, this caching information would need to be generated for every applicable document. Furthermore, many Web page documents are dynamically generated by an application program in an automated manner, e.g., by a servlet or Java Server Page implementation. Thus, implementing a caching policy for such a document after the application program is implemented would require changes to the program code associated with that document. Even to implement a caching policy for a static Web page after posting the page would require changes to the document server serving the page. Having to change the underlying code for an application program or Web page can be labor intensive and error prone, a problem that is exacerbated when such changes are required for multiple pages or documents, multiple applications, multiple client devices, multiple transport protocols, etc.
One other approach for a document server to convey caching policies to client processes is for the client process to ask the document server directly. The document server could provide a method (e.g., a remote procedure call) for a client process to ask for t, given some document instance. This requires extra programming from both server and clients and, therefore, is not an optimal solution.
Specification of fragment-based caching becomes even more complicated when documents are dynamically assembled by application frameworks in which code that assembles the documents is, in turn, generated by tools rather than written by a human. For example, it is a complicated task to embed caching tags into responses generated using the JavaServer™ Faces framework. In many situations, embedding caching tags into document instances simply cannot be accomplished with existing approaches, because the presentation layer that is normally responsible for inserting the tags is a few layers removed from the modules that generate dynamic fragments and which would have the caching policies for those fragments.
The foregoing approaches also have the disadvantage of the document server sending redundant data, i.e., the value t is unlikely to change for each document expiration cycle, and should not have to be sent repeatedly. Also, these approaches are not centralized, i.e., any change to the caching policy needs to be communicated to all the possible points where document server responses are assembled. For example, stock information documents may be served in the form of HTML pages, and as part of an XML-based stock quote web service. The same class of XML documents that is sent via the stock quote web service is also used to generate an HTML page (via stylesheet transformation). Thus, even if both forms of the stock information use, for example, the HTTP protocol and the ESI caching header, there are likely different endpoint modules that are responsible for creating the response and inserting the caching header in it (e.g., a Web service versus a servlet or a JSP). Therefore, if the frequency of stock updates changes, both endpoints need to be updated with a new value.
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4. XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. XHTML documents are XML conforming and, therefore, are readily viewed, edited, and validated with standard XML tools. As with XML documents, once an XHTML data document is served to a client process, that process may want to cache the document. Thus, similar issues regarding management of network content caching policies are/will be present with XHTML-based applications and services as with XML applications and services.
Based on the foregoing, better techniques are needed for managing caching policies in the context of XML and other types of content commonly transported over a network.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.