Data storage is an essential feature of computer systems. Such storage typically includes persistent data stored on block-addressable magnetic disks and other secondary storage media. Persistent data storage exists at several levels of abstraction, ranging from higher levels that are closer to the logical view of data seen by users running application programs, to lower levels that are closer to the underlying hardware that physically implements the storage. At a higher, logical level, data is most commonly stored as files residing in volumes or partitions, which are associated with one or more hard disks. The file system, which can be regarded as a component of the operating system executing on the computer, provides the interface between application programs and nonvolatile storage media, mapping the logically meaningful collection of data blocks in a file to their corresponding physical allocation units, or extents, located on a storage medium, such as clusters or sectors on a magnetic disk.
Users and administrators of computer systems benefit from having the ability to recover earlier versions of files stored on the system. Users may accidentally delete or erroneously modify files. An administrator of a system that has become corrupted may wish to recover the entire state of a file system at some known good time before the corruption occurred. The underlying disk hardware can fail. A snapshot is one technique for facilitating the recovery of earlier versions of files.
A snapshot of a volume is a virtual volume representing a point in time on the original volume. Some snapshotters capture the point-in-time data by mirroring the entire contents of the volume in its snapshot state. By contrast, differential snapshotters do not make actual copies at the time of the snapshot. Rather, changes to the original volume are carefully monitored so that the virtual volume (i.e., the snapshot) can always be produced. A differential snapshotter will copy a block in the volume only if it is modified after the snapshot is taken; such a copy operation is called a “copy-on-write.” The snapshot state of the volume can be reconstructed by using these copies of changed blocks along with the unchanged blocks in the original volume. In the usual case, many files in the volume will be left unchanged following the snapshot, so differential snapshotters provide a more economical design than nondifferential approaches. As many changes occur to the original volume, however, a differential snapshotter must keep a large area of disk space to hold the older versions of the disk blocks being changed.
In most operating systems, the extents that make up the physical allocation units implementing a particular file may be discontiguous, as may the pool of allocation units available as logically free space for use in future file space allocation. A disk volume in such a state is said to be externally fragmented. In many such operating systems, a volume can be expected to suffer from increasing external fragmentation over time as files are added, deleted and modified. External fragmentation increases the time necessary to read and write data in files, because the read/write heads of the hard disk drive will have to increase their lateral movement to locate information that has become spread over many non-contiguous sectors. If fragmentation is sufficiently severe, it can lead to significantly degraded performance and response time in the operation of the computer system.
Defragmentation utility programs provide an important remedy for data storage systems that are prone to external fragmentation. These utilities can be periodically run to rearrange the physical location of a volume's file extents so that contiguity of allocation blocks is increased and disk read/write access time is correspondingly reduced, improving performance. A defragmentation operation consists of moving some blocks in a file to a location that is free on the volume. More precisely, the contents of one block are copied to the free block location. The old location of the block becomes free and the new location of the block becomes occupied space. The defragmentation of a volume will typically involve an extensive number of such block moves.
Although users of file systems benefit from the disk speed optimizations achieved by defragmentation, the benefit has come at the expense of efficient use of differential snapshotters. If a volume is defragmented subsequent to the taking of a snapshot, the snapshotter will ensure that each data block relocation by the defragmenter is preceded by a copy-on-write of the block. The logical view of the original volume is unchanged by the defragmentation operations, but because the disk blocks on which the disk is physically manifested change drastically in content, the amount of space needed to maintain the snapshot explodes. This disk space explosion may be enough to destroy a principal reason for using differential snapshotters in the first place, that of disk space economy.
The problem seen in the interaction between differential snapshotters and defragmentation operations is that, prior to the present invention, differential snapshotters have not been able to distinguish logically significant writes of blocks from logically insignificant block moves, treating both as requiring copy-on-write protection. This problem is particularly acute when there is a volume defragmentation operation on the original volume, but those of skill in the art will appreciate that other file-manipulating programs besides defragmenters may require the nonlogical relocation or shuffling of file blocks. For example, a program might, for performance reasons, create a file of a particular size and arrange the blocks in a desired way before proceeding with further use of the file for writing data. Prior to the present invention, differential snapshotters have treated such block rearrangements as requiring copy-on-write protection.
It can be seen, then, that there is a need for an improvement in differential snapshotters so that logically insignificant moves of blocks from one volume location to another are recognized as not requiring copy-on-write protection in principle. The availability of more efficient differential snapshotters will make more likely the use of snapshots applied on a longer-term basis for data recovery. Moreover, such an improvement will lead to greater use of defragmentation utilities and therefore will allow disk speed optimizations to take place while having snapshots with little performance impact and little disk space consumed.