1. Field of the Invention
This invention relates to data access and in particular to maintaining integrity of cached copies of data.
2. Related Art
The use of proxy servers to cache frequently accessed data sets is well known. Proxy servers may be provided to service a “local” community of users, storing (caching) local copies of frequently requested data sets that would otherwise need to be retrieved from their respective originating data sources every time a user requested access to them. Once a proxy server has stored a local copy of a particular data set, a subsequent request for access by a user to that data set is intercepted by the proxy server and access provided rapidly to the locally cached copy rather than to the originating source specified in the request.
A proxy server may include features to monitor user access requests and to select data sets for caching according to a predetermined selection algorithm. For example, a data set may be selected for caching if access to it was requested from three or more different users over a predetermined time period. A cached data set may be deleted from the cache if the time period between consecutive access requests exceeds a predetermined threshold.
A proxy server must ensure that any cached data sets remain up-to-date with respect to changes to the “original” data set held at the originating data source. To achieve this, known proxy servers use one or more of the following techniques:    (1) Periodic checking— once a data set has been cached, the proxy server submits periodic requests for access to the original source of the data set to determine whether amendments have been made. However, if the proxy server is to keep up to date with many cached data sets, a great deal of proxy server processing time and communications bandwidth is consumed if the period between requests is to be kept sufficiently short in order to avoid serving out-of-date data sets to users.    (2) Patterns associated with data being updated— the proxy server looks for patterns in the update of a data set and attempt to predict when it will next be amended. For example, if a data set has consistently been updated each morning at 6 am (e.g. a newspaper), then the proxy server may download a new copy of the data set from the corresponding source at say 6.01 every morning. However, it is not possible to be 100% accurate in predicting when a data set will be updated.    (3) Specified expiry time— a data set provider tags the data set with a ‘will be valid until . . . ’ message. The proxy server will not seek to refresh the cached copy until after that time. However, timely refresh of the cached copy depends upon the clocks between the proxy and data source being reasonably well aligned and upon the data set not expiring early. In practice short expiry periods are used, e.g. 1 hour.    (4) Update queries triggered by user access requests—every time a proxy server receives a request for access to a cached data set, the proxy server sends a message to the corresponding source of that data set asking “Has this data set been updated since xxxx”, where xxxx is a time or date. If it has, then a copy of the new data set is downloaded to the cache. While this is one of the most common modes of operation of proxy servers, it may add considerable time delay to the servicing of a request for access, and consumption of communications bandwidth in submitting an update query every time. This dramatically decreases the Quality of Service available to broadband users who expect far more rapid access to requested data sets.