The Internet is a powerful tool for disseminating information. Many times, the information that can be found on the Internet is in the form of documents composed of many component resources. When a composite document—non-limiting examples of which are a HyperText Markup Language (HTML) document, an Extensible Markup Language (XML) document, a Wireless Markup Language (WML) document, a compact HyperText Markup Language (cHTML) document, or an Extensible HyperText Markup Language (XHTML)—is requested by a particular client, generally the requested document is compiled by web applications and sent to the client via a web server. Clients can be browser-based or clients can comprise non-browser applications that have the capability to request information over the Internet.
Fig. 1A illustrates a typical system 100 through which information is delivered to clients over the Internet. For example, client 101 requests a particular document over the Internet 102 using a Uniform Resource Locator (URL) corresponding to the document. Components within the Internet 102 locate the address in the URL and make a request to an appropriate web server 104. In response to the request, web server 104 checks its cache layer 103 to see if the resource being requested is in cache layer 103. If the resource is in cache layer 103, then web server 104 returns the resource from cache layer 103 without any more processing. However, if the resource is not in cache layer 103, then web server 104 makes a request to an appropriate application 120. Application 120 may access a database 130 or 131 in order to compile the resource in response to the server's request. Accessing a persistent store such as database 130 or 131 is best minimized because such access is computationally expensive and prone to latency. It will be apparent to those of skill in the art that systems for delivering information over the Internet can be configured in many different ways
As illustrated in FIG. 1B, an example cache layer 103 associated with web server 104 stores several documents 111-114. Each of these documents 111-114 is stored for an amount of time that is negotiated by web server 104 and web application 120, which is called Time To Live (TTL). Once a document has been stored in cache layer 103 for longer than the document's TTL, the document is considered expired. An expired document is evicted from the cache. After a resource, such as a document, has been evicted in this way, a request for the resource must go through the web application 120. Documents in the cache generally are indexed in cache layer 103 by each document's URL.
For example, a request for a document with the URL “URL1” might be sent to web server 104 from a client, such as client 101 of FIG. 1A. Cache layer 103 associated with web server 104 has cached resources 111-114 that have not expired and that are associated with “URL1,” “URL2,” “URL3,” and “URL4,” respectively. Web server 104 sends “URL1” into cache layer 103 to check if resource 111, associated with “URL1,” is in cache layer 103 and unexpired. Because cache layer 103 has an unexpired copy of cached resource 111 associated with “URL1,” web server 104 simply returns resource 111 to the client from cache layer 103 without having to make a request to web application 120.
Still referring to FIG. 1B, if the client then sends a request for “URL5” to web server 104, then web server 104 will check to see if resource 115, associated with “URL5,” is cached in cache layer 103. Because cache layer 103 does not have resource 115 associated with “URL5” cached, web server 104 requests resource 115 from web application 120. Web application 120 returns resource 115 to web server 104. As illustrated in FIG. 1C, web server 104 then caches resource 115, associated with “URL5,” in cache layer 103. Thus, any subsequent request for “URL5” before resource 115 expires will be fetched from cache layer 103 without involvement of web application 120.
Storing requested resources in cache layer 103 may save time for subsequent requests. However, one problem with this caching system is that the TTL value for a composite document is generally equal to the shortest TTL of the resources making up the composite document. If, for example, a composite document is composed of a video clip with a TTL of one week, a description of the video clip with a TTL of one week, and viewer commentary about the video clip with a TTL of five minutes, then the TTL of the composite document will be only five minutes. As a result, the entire document will be retrieved from the web application 120—rather than cache layer 103—once the document has been evicted from cache layer 103 after five minutes, though the video clip and the description of the video clip are technically unexpired and only the viewer commentary needs to be refreshed. Also, business logic running on web server 104 that processes the request for the composite document might be written to rebuild the entire composite document from the document's component parts after the document expires in the cache layer 103. Thus, the nature of this business logic running on web server 104 can result in unnecessary calls to web application 120 to fetch components of the composite document that have not expired.
Another problem with previous approaches is that if three different resources contain the same news story, but are indexed by different URLs, then the cache layer caches each resource separately, indexed by its respective URL. The result is that three separate instances of the same news story are held in the cache, which is a waste of cache space.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.