1. Field of the Invention
The invention is in the field of data protection, and more particularly concerns methods for creating a consistent backup image of a storage volume as it existed at a point in time, where the storage volume remains available for read and write operations during the period that such image is being created, without the necessity of using “snapshot” methods.
2. Description of the Related Art
Backup, redundancy and disaster recovery (the latter two of which presume the ability to back up data volumes) are major concerns in the area of data protection. A full backup of a large data set may take a long time to complete. On multitasking or multi-user systems, there may be writes to that data while it is being backed up. This prevents the backup from being self-consistent (atomic) and introduces a skew in the backed-up data that may result in data corruption. For example, if a user moves a file from a directory that has not yet been backed up into a directory that has already been backed up, then that file would be completely missing on the backup media. Version skew may also cause corruption with files that change their size or contents underfoot while being read. See en.wikipedia.org/wiki/Snapshot_(computer_storage).
One approach to safely backing up live data is to temporarily disable write access to data during the backup, either by stopping the accessing applications or by using a locking API provided by the operating system to enforce exclusive read access. This is tolerable for low-availability systems (on desktop computers and small workgroup servers, on which regular, extended downtime is acceptable). High-availability 24/7 systems, however, cannot bear service stoppages of this nature.
To avoid downtime, high-availability systems may instead perform the backup on a “snapshot”—a read-only copy of the data set frozen at a point in time—and allow applications to continue writing their data in a manner that does not interfere with the integrity of the particular data set comprising the snapshot.
There are numerous snapshotting implementations. In some approaches, the method involves (a) flushing all buffers; (b) blocking all writes; (c) recording all file metadata; and (c) suspending or redirecting writes in some manner (such as caching, directing the written data to alternate locations, and numerous other variations). The data blocks at the point in time of the snapshot are identified by the metadata that was collected. The underlying blocks of the snap-shot may then be copied for as long as necessary while changed data (after the point-in-time of the snapshot) is stored elsewhere—i.e., in other blocks or completely other places than where the data corresponding to the snapshot is located.
Actual snapshot implementations range from “copy on write” techniques using volume managers (in which a snapshot is taken of device control blocks and the original blocks treated as read-only with new and changed blocks thereafter written in different locations), to approaches based on version-tracking internal to the file system, to database approaches, memory-based approaches and other approaches. Most snapshot implementations are relatively efficient and can create snapshots in 0(1). In other words, the time and I/O needed to create the snapshot does not significantly increase with the size of the data set, whereas the same for a direct backup is proportional to the size of the data set. It is still necessary to back up the underlying data set, but this can be done while the volume is in full use, and the resources used to copy the data can be balanced so as not to interfere unduly with normal production processing.
Nevertheless, taking snapshots of large volumes, while not as time consuming as performing a full-volume backup, still takes a non-negligible amount of time, during which pending writes must be delayed, interrupted, suspended or otherwise interfered with. If the volume is being used very heavily in production, completing a snap-shot may take on the order of tens of minutes, or more, due to the heavy and continuous storage device I/O under such conditions. Over such an extended period, the assumptions on which the particular snap-shotting approach is based my not bear out. Accordingly, in practice, an unacceptably high number of large volume snapshots will fail under such high load conditions. This may delay or make it impossible to create a good backup within the requisite time frame, which consequently creates practical problems for enterprise data protection programs.