1. The Field of the Invention
The invention relates to computer systems that increase information availability by maintaining duplicate copies. Specifically, the invention relates to devices, methods, and systems for replicating information from a hierarchically-indexed data store to a backup data store.
2. The Relevant Art
Data processing systems often maintain duplicate storage volumes or other data stores. To create and maintain duplicate data stores, data from a source volume is replicated onto a target volume. Data replication techniques generally include backup and archival storage techniques, data mirroring techniques, and the like.
Some existing data replication systems replicate data in a synchronous, atomic manner in order to maintain coherency between the source and target volumes. Synchronous replication techniques often halt further data processing until an acknowledgment has been received that a write operation performed on the source volume has also been performed on the target volume. Consequently, synchronous replication techniques may significantly degrade data replication performance, particularly when replication takes place over a high-latency communication line.
Multiple techniques have been devised to minimize performance degradation while achieving an acceptable level of coherency between source and target volumes. One technique is periodic full-copy replication, where an entire volume is replicated to a target volume in order to provide an exact copy of a source volume image at a specific time. Disadvantages of periodic full-copy replication include inefficient usage of resources to copy data that has not changed since a previous replication and the necessity to ensure that the source volume is not changed during replication.
Another technique for efficient data replication is incremental replication, where the source is replicated on the target as a baseline volume, and thereafter only changes to the source are replicated to the target on a periodic basis. One disadvantage of incremental replication is the necessity to ensure that the source volume is not changed during identification and replication of changed data.
Point in time replication techniques, such as snapshot operations, may save the system state information as well as changed data. Such techniques have been used to alleviate the need to suspend processing during incremental replication. Some data replication systems that require continual access to replicated data use snapshot techniques to implement data mirroring. Currently available snapshot replication techniques facilitate restoration of a source volume to a previous desired state and restoration of a particular image on the source volume.
One existing method for conducting snapshot-based replication involves saving a periodic snapshot of a source volume within an incremental system log and replicating both the log (including the incrementally changed data) to a target volume, where the snapshot includes all data modifications made to the source volume since a previous snapshot. Using an incremental log simplifies the process of replicating changes to a target system. Since the replication is based on specific snapshots, replication of the source image may occur as asynchronous to the actual write operations.
FIG. 1 is a block diagram illustrating a typical prior art log-structured mirrored data store 100 and associated snapshots. The depicted log-structured mirrored data store 100 includes a source volume 100a and a target volume 100b. As depicted, the source volume 100a includes one or more snapshot instances 120 and file writes 122, and the target volume 100b includes target snapshot instances 130 and replicated files communicated from the source volume 100a. 
Typically, the file write entries 122 correspond to data written to sequential logical data blocks on a storage device corresponding to the source volume 100a. Each snapshot instance 120 may contain a copy of a current block map for the source volume 100a at the time that the snapshot instance 120 is taken. The block map is a master index linking file names and attributes with the associated data blocks that constitute the files.
The target file writes 132 are copies of the source file writes 122, written asynchronously to the target volume 100b after creation of the snapshot instance 120. The target snapshot instances 130 may contain copies of the current block map for the target volume 100b written to the target volume log directly following the target file writes 132. The target snapshot instances 130 may be identical to the corresponding source snapshot instances 120.
While the depicted approach creates a complete and consistent image of a source volume by replicating only the changed data, the depicted approach functions only on systems that use log-structured file systems.
FIG. 2 is a block diagram illustrating a typical prior art hierarchically-indexed data store 200 and associated snapshots. The depicted hierarchically-indexed data store 200 includes a master data block 210, a master index block 220, one or more file index blocks 222, one or more data files 250, a snapshot tracking system 230 including a snapshot map 232, one or more snapshot files 240 and associated snapshot headers 234, snapshot indices 236, and archived data blocks 238.
The master data block 210 points to the master index block 220. The master index block 220 points to file index blocks 222 which in the depicted example includes file index blocks 222a through 222e. The depicted file index blocks 222a-222c point to corresponding data files 250a-250c, creating a hierarchically-indexed system for each data file location.
The data files 250 typically include a series of data blocks linked by pointers to subsequent data blocks. For clarity, the data files 250a to 250c are shown as a single contiguous segment. As the data files 250 are modified, blocks may be added to a particular file by linking additional data blocks to the series of linked data blocks. Data blocks may also be deleted from a file by eliminating links to the deleted blocks. Similarly, data blocks may be modified within a file by overwriting the data in the linked blocks.
The file index blocks 222d and 222e point to snapshot files in the same manner as file index blocks 222a-222c point to data files. The depicted snapshot tracking system 230 is essentially a flat indexed file system that includes a map of snapshot instances 232 and associated snapshot files. A snapshot instance 240a includes the snapshot header 234a, the snapshot index 236a, and the archived data blocks 238a-238c. The archived data blocks 238a-238c are copies of data blocks from data files 250a-250c before they were deleted or overwritten as data files 250a-250c are modified subsequent to a preceding snapshot 240b. The snapshot 240b includes the snapshot header 234b, the snapshot index 236b and the archived data blocks 238d-238f. 
Restoration of the snapshot instance 240a via a snapshot provider or the like allows the hierarchically-indexed data store 200 to be restored to the exact state that existed when snapshot instance 240a was taken. Successive application of snapshot instance 240a and snapshot instance 240b allow the hierarchically-indexed data store 200 to be restored to the exact state that existed when snapshot instance 240b was taken.
The current art, exemplified by the hierarchically-indexed data store 200 establishes snapshot data structures that facilitate a complete and consistent image for recovery purposes. Nevertheless, as depicted in FIG. 2, the current art for hierarchically-indexed systems is inefficient and complex compared to log-structured systems, and does not directly support snapshot-based asynchronous replication to a target data store.
Consequently, a need exists for devices, methods and systems that perform asynchronous log-like snapshot-based replication for non-log-structured file systems and associated data stores. Such devices, methods, and systems would provide the advantages of log-structured replication techniques to widely available hierarchically-indexed data storage systems.