1. Field of the Invention
This invention relates to data backup and more particularly relates to differential data backup using point-in-time snapshot data.
2. Description of the Related Art
Data processing systems often employ a snapshot module to facilitate data storage and retrieval operations. The snapshot module tracks writes to a data set on an active data storage device, creating a point-in-time instance of the data set. The active data storage device may be a memory such as the data processing system's memory, or a hard disk drive. The snapshot module creates the point-in-time instance of the data set of the active data storage device and uses the point-in-time instance to perform operations on the data set. For example, the snapshot module may mirror a point-in-time instance of the data set to an alternate data storage device by copying to the alternate data storage device only the data blocks of the data set that have been modified.
The snapshot module monitors and tracks the modified data blocks as write information such as a write record. The write record identifies a modified data block. In one embodiment, the snapshot module reduces the bandwidth of storing the data set from the active data storage system by storing only the data blocks that the snapshot module identifies as having been modified. For example, the data processing system may retrieve a data set from a source data storage device such as a storage server to an active data storage device such as a memory or a hard disk drive. The data processing device modifies data blocks of the data set stored on the active data storage device. The snapshot module tracks the data blocks of the data set that have been modified and creates a point-in-time instance of the modified data set. The snapshot module may store the point-in-time instance to the source data storage device by copying only the modified data blocks back to the source data storage device. The bandwidth required to store the data set is reduced as fewer data blocks are copied as only modified data blocks are copied.
In another example, the data processing system may retrieve a data set that includes ten (10) data blocks from the source data storage device to the active data storage device. If the data processing system only modifies the first data block on the active data storage device, the snapshot module can create an instance of the modified data set on the source data storage device by only copying the modified first data block to the source data storage device as the other nine data blocks are unmodified. The snapshot module uses the on-write data structure to identify the modified data blocks such as the first data block. By copying only the modified data blocks, the bandwidth required for copy operations is reduced.
An important copy operation is data backup. Data storage devices must be regularly backed up to protect valuable data from loss due to hardware failure or data corruption. For example, the data storage devices storing transaction data for a business may be backed up repeatedly because of the high cost of losing any of the transactions. The data backup of the data storage devices often requires significant storage. A user may desire not only to backup the current instance of a data set, but also may desire to backup and maintain a plurality of instances of the data set so that the data set may be recovered from a plurality of points in time.
Unfortunately, storing the plurality of backup instances may require additional data storage capacity that is far in excess of the size of the data set that must be backed up, particularly if the data set is backed up frequently and if each backup instance is stored for an extended time. For example, backing up a data set every minute would require storing one thousand four hundred and forty (1440) instances of the data set each day. Yet the storage requirements for the plurality of backup instances could be reduced if point-in-time instances of the modified data blocks of the data set could be backed up. Backing up only modified data blocks significant reduces the bandwidth and storage requirements of backup operations. In addition, the snapshot module already maintains the required information on modifications to data blocks for each point-in-time instance.
From the foregoing discussion, it should be apparent that a need exists for an apparatus, system, and method that employs snapshot module on-write data to create multiple temporal instances of a data set as differential data set backups. Beneficially, such an apparatus, system, and method would improve the efficiency of data backups.