Data information systems need to store and maintain data and use storage devices to hold data persistently. New data is introduced and existing data is modified regularly. Determining what information has been added to or modified on a storage device over a specific time interval is necessary for back-up and to provide redundancy for data stored on the storage device. Modification information can also be of interest to transactional systems that are concerned with ensuring that updated data was updated successfully on a storage device.
To ensure the availability and integrity of data, data is often backed up, archived or otherwise replicated. The back-ups, archives and replicates of information represent a set of critical functions required by many data information systems.
Data back-up is typically performed on a per file basis to allow individual files to be restored. Multiple versions of a file are usually stored by a back-up system, allowing access to older versions of a file. However, keeping multiple versions of data require substantially more storage space than the space occupied by the data being backed up. The need for more storage space, coupled with the fact that back-up data is typically not often referenced, encourages the use of lower cost storage media. If a file becomes corrupted at a point in time, it is possible that the file may be restored to a previous version to restore the file's integrity.
After taking an initial full file system back-up, a common method for back-up is to determine which files in the file system have been modified, by examining the file modification stamp to see if it has changed since the last back-up. If the file was modified, then the file data is copied. This method is referred to as incremental back-up. Incremental back-ups reduce the amount of data that is copied. The file system maintains the modification information and the file system interface can be used to determine which files have been modified. Also, since all the file data is copied, it is easy to collocate file data on the destination storage medium. This is advantageous for data being written to sequential media such as tape. A disadvantage to this method is that if only a portion of a file has been modified, the amount of data copied may be substantially more than what was modified.
An alternative method for data back-up is to determine what portions of a file have been modified and to copy only those portions that have been modified. One method to accomplish this is differential back-up. Differential back-up stores a compressed image of the file. Pieces of the compressed image can be compared against the file to determine if a portion of the file has been modified. Differential back-up has proven effective and is particularly useful for laptop computers or other computing devices that have limited bandwidth between the device and the destination storage medium. A disadvantage of this approach is that the host system has a compressed file image that requires resources to compress and to store the result. The host needs to examine the compressed image of modified files to determine what portions of the file have been modified. While an effective technology in environments where the rate of data modification is relatively low, it is less effective in environments where data modification occurs frequently or on a large amount of data or where host processor capacity is at a premium.
In addition to being backed up, data can be replicated to ensure that it is available from more than one source. Replication can be performed either dynamically or periodically. Dynamic replication ensures that replicated data is kept consistent at all times. Periodic replication ensures that data is guaranteed to be consistent only at specified times. At other times the device holding the data to be replicated and the devices that hold copies of that data may not be fully consistent.
Mirroring is an example of dynamic data replication. A storage device has its contents “mirrored” by one or more other storage devices forming a mirror set. Updates that occur will be applied simultaneously to each of the mirror set storage devices, keeping each device's data consistent with the other members of the mirror set. Mirroring can also be used to make data more widely available by making it simultaneously accessible from more than one device. Mirroring ensures that in the event of a device failure that device's current data remains available. However, mirroring can increase the latency of updates. The provider of the mirroring service must also have a mechanism to handle failure events to ensure that the mirrored devices remain coherent. Maintaining consistency between members of a mirrored set of devices needs to occur even during peak workloads, when resources are constrained. In addition, mirroring cannot be used in place of back-up. A back-up is still needed for recovering previous versions of a file or to recover a file if its is inadvertently deleted.
Replication can also be performed on a periodic or delayed basis. Periodic replication does not provide instant access to data in the event of a device failure. Such an approach does not provide a mirror set that is coherent except for those times when replication is performed. Data archival is an example of delayed replication. The contents of an archive are a replica of the data at some point in time, but changes occurring after the archive was made are not reflected. Data archival takes a copy of data off-line. Archived data can be combined with incremental back-ups to apply modifications to archived data. Data archival is an expensive process, in that typically all the data from a storage device is copied with each archival.
Various methods have been employed to determine what data on a storage device has been added or modified over some period of time because copying more data than is necessary for back-up, archival or replication purposes is undesirable. These methods typically are external to the storage device itself, often on a host system that owns the storage device or on a host adapter to which the storage device is attached. Host systems typically store data modification information in the form of a modification time stamp associated with each file within a file system. Storing and managing modification information on a host system are not an efficient use of the host's computing resources and result in poorer overall performance.
A file system could be implemented to track modifications on a per block basis. It might accomplish this by storing modification information about each block in the meta-data that the system keeps about each file. However, it would be difficult to ensure that file data and meta-data about the file are consistent with respect to one another in the event of a system failure. Any host system on which such a file system would exist would incur the additional overhead of such a facility.
Regardless of the method selected, it is desirable to reduce the overhead associated with determining what information has been added to or modified on the storage device. The need is exacerbated as storage devices increase in storage capacity and as more data needs to be processed.
Therefore, there remains a need for a method and storage system that can efficiently provide consistent data modification information to the clients without the drawbacks described above.