Many contemporary data processing systems consume and/or produce vast quantities of data. Electromechanical devices such as hard disk drives are often used to store this data during processing or for later review. The mechanical nature of many types of mass storage devices limits their speed to a fraction of the system's potential processing speed, so measures must be taken to ameliorate the effects of slow storage.
Mass storage devices are commonly viewed as providing a series of addressable locations in which data can be stored. Some devices (such as tape drives) permit storage locations to be accessed in sequential order, while other devices (such as hard disks) permit random access. Each addressable storage location can usually hold several (or many) data bytes; such a location is often called a “block.” Block sizes are frequently powers of two. Common block sizes are 512 bytes, 1,024 bytes and 4,096 bytes, though other sizes may also be encountered. A “mass storage device” may be constructed from a number of individual devices operated together to give the impression of a single device with certain desirable characteristics. For example, a Redundant Array of Independent Disks (“RAID array”) may contain two or more hard disks with data spread among them to obtain increased transfer speed, improved fault tolerance or simply increased storage capacity. The placement of data (and calculation and storage of error detection and correction information) on various devices in a RAID array may be managed by hardware and/or software.
Occasionally, the entire capacity of a storage device is dedicated to holding a single data object, but more often a set of interrelated data structures called a “filesystem” is used to divide the storage available between a plurality of data files. Filesystems usually provide a hierarchical directory structure to organize the files on the storage device. The logic and procedures used to maintain a filesystem (including its files and directories) within storage provided by an underlying mass storage device can have a profound effect on data storage operation speed. This, in turn, can affect the speed of processing operations that read and write data in files. Thus, filesystem optimizations can improve overall system performance.
FIG. 2 represents an array of data blocks 2 of a mass storage device. Individual blocks are numbered 200, 201, 202, . . . , 298, 299. Successively-numbered blocks are physically adjacent: the mechanical system used to access the data on the mass storage device does not have to move far to reach adjacent blocks, so the blocks can be accessed relatively quickly (note that the file system may use storage virtualization, such that for any given data block, the block number of that block on disk may not coincide with the block number used by the file system for that block). Three multi-block data objects are indicated with black-filled blocks. Blocks of a multi-block data object can be thought of as logically adjacent: there is a first block containing the first part of the object, followed by a second block containing the second part of the object, and so on; but logically adjacent blocks need not be physically adjacent.
The distinction between logical and physical adjacency is apparent in the first data object, including blocks 203, 217, 244 and 222 (in that order). None of these data blocks is physically adjacent to any of the other blocks, so the data object is said to be fragmented: the system would have to perform a time-consuming seek operation before reading each block to load the data object.
The blocks of the second data object, 271 through 276, are both physically and logically adjacent, so the second data object is unfragmented. All the blocks are contiguous and sequentially stored, so this object could be loaded with only one seek (to reach the beginning of the object).
The third data object, including blocks 281, 282, 284, 285 and 237-239, is partially fragmented. It can be processed relatively quickly by loading blocks 281-285 and discarding unrelated block 283, then seeking to block 237 before loading the final three blocks of the object. Unfragmented or partially fragmented data objects can usually be accessed more quickly than heavily fragmented objects.
Data in fragmented objects can be moved around (blocks relocated on the mass storage device so that they are physically adjacent to logically-adjacent blocks) to reduce fragmentation and improve access speed. Unfortunately, file defragmentation is a time-consuming process, as blocks must be located, read into memory, and then stored in more nearly sequential locations. If the storage device has little free capacity, it may be necessary to move blocks of other objects from place to place to create free areas large enough to hold a defragmented object. Furthermore, files that change or grow tend to become increasingly fragmented over time, necessitating repeated defragmentation operations.
Techniques to reduce fragmentation without explicit, time-consuming defragmentation cycles, may be useful in improving storage operations.