For network client applications, such as web browsers, a limiting performance factor is often low bandwidth to the server. To mitigate this low-bandwidth problem, network client applications often cache content replicated from servers, so that as much information as possible is kept available on the client user's hard drive. As data access times from the hard drive and RAM are typically orders of magnitude faster than download times, some or all of a server's content may often be rapidly accessed from the cache with little or no downloading of data from the server. Other types of caching are directed to similar problems.
In general, to cache content, the local machine stores the data in a database, file system or system memory. To retrieve content, the cache is queried for items with acceptable attributes and one is chosen according to the application's criteria. For example, more than one translation of a text document might be acceptable to a user.
If there is a strict mapping of at at most one acceptable cached item per query, the content can be indexed by a unique lookup key, such as a Uniform Resource Identifier (URI), a compact string of characters for identifying an abstract or physical resource. Examples of URIs include URLs (Uniform Resource Locators), URNs (Uniform Resource Names), and other standard namespaces. A URI may be used as the lookup key to a cache, as can other names, such as a globally unique identifier (GUID).
While content caching thus provides substantial performance improvements, a problem with caching is that the locally cached content is static, whereas the content (e.g., network content) may or may not have changed. To avoid this problem, HTTP (hypertext transfer protocol) provides for sending a conditional request, e.g., an "If-Modified-Since" (IMS) request, an "If-None-Match" request, or the like to the server, identifying the content by a timestamp or entity tag. When the server receives such a conditional request, it uses the timestamp and/or entity tag to test whether the content has changed, and, if the content has not changed, the server responds with a "not modified" response, otherwise the server provides the modified content.
While this provides an overall increase in the available network bandwidth by reducing the amount of data that needs to be transmitted, not much in the way of savings is achieved at the server end. More particularly, the server often does almost as much work to determine if a content has been modified as it takes the server to simply retrieve and return the corresponding requested content. At the same time, many conditional requests may be made for content that is rarely, if ever, modified. This wastes server resources, increases client latency and also consumes available bandwidth.
One solution is to have the provider of the content indicate an "Expires" header comprising a date/time stamp, "Cache-Control" header specifying a max-age relative to the current time, or the like. When cached, the local system ordinarily does not send a conditional request before the particular time determined by the expiry mechanisms. However, this only works when the content provider provides an appropriate timestamp header, which frequently does not happen, sometimes because it is not appropriate for the content to have a distant expires time, e.g., it is expected to change frequently, and sometimes because it is simply not used by the provider.
Another solution is to have the local system only occasionally check to determine if cached content has been modified, based upon some criteria such as user action or a time schedule. For example, when particular content that is in the cache is requested, a browser may send an If-Modified-Since request for that content only once per browser session and/or once per day, and so on, (although the user can force a refresh as desired). This solution may work in conjunction with expiry mechanisms, for example, always check if the content is known to be expired, otherwise check according to the schedule.
However, both solutions still result in a large number of conditional requests being sent for content which rarely, if ever, changes. Explicit expiry information from the server often fails for static content since many providers do not use it, while the scheduled refreshing solution reduces conditional requests to an extent but still results in many requests for content that has not been modified. For example, a typical user may only browse much of the cached content once per session and/or once per day, whereby this second solution hardly, if at all, results in a reduced number of conditional requests taking place.