It is often useful to maintain two or more copies of the same information. For example, it is common for a primary copy of a data item to be stored on disk, and a cached copy of the data item to be stored in volatile memory. Accessing the cached copy of the data item is much faster than accessing the primary copy, so the cached copy is typically used to service read requests directed to the data item.
While maintaining multiple copies of a data item can significantly speed up the handling of read requests, handling write requests is another matter. For example, when multiple copies of a data item exist, one way to handle write requests directed to the data item would be to update every copy of the data item. Updating all copies of the data item would ensure that all copies remain up-to-date. Unfortunately, updating every copy of a data item is much less efficient than updating a single copy of the data item. The inefficiency of updating all copies of a data item is even worse when one or more of the copies is compressed and/or encrypted. Under these circumstances, making the update may involve decompressing data, making the update, and then recompressing the data.
As an alternative to updating all copies of a data item in response to an update request, it is possible to update fewer than all copies, and to keep track of which copies were updated. For example, assume that a primary copy of a data item is on disk, and a cached copy of the same data item resides in volatile memory. Further assume that the cached copy is compressed. Under these circumstances, the most efficient way to respond to an update to the data item may be to apply the update to the primary copy, and mark the cached copy as “invalid”.
After a copy of a data item is marked invalid, that copy is no longer used to service read requests for the current version of the data item. Instead, handling a read request for the current version of the data item may involve accessing the primary copy of the data item on disk, and loading a new copy of the data item into volatile memory. Until invalidated, that new copy of the data item may then be used to service read requests for the current version of the data item.
Maintaining multiple copies of a data item becomes even more complicated when some read requests may be for past versions of the data item. Such requests are referred to herein as past-version requests. For example, a read request may be for the version of a data item that existed at a particular past point in time (T1). If a particular copy of the data item is marked “invalid”, it is not possible to know whether that copy of the data item may be used to service the read request. Specifically, if that copy of the data item was marked invalid after time T1, then the copy may in fact be the exact version needed by the past-version request.
Based on the foregoing, it is desirable to provide a way to keep multiple copies of data items, invalidate those copies that become out-of-date, and yet be able to use those out-of-date copies, when possible, for read requests that specify past points in time.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.