A file system is a programmatic entity that imposes structure on an address space of one or more physical or virtual storage devices, such as disks, so that an operating system may conveniently read and write data containers, such as files and blocks, and related metadata.
In non-journaling file systems, an interruption of a hard disk (e.g. due to loss of power) while the file system is writing metadata can cause the metadata to be incompletely written. Metadata is information about data, e.g. a location of the data or names of the files. When metadata is incompletely written, the description of the data is inconsistent with the data itself.
In a journaling file system, journaled metadata are kept to avoid file system errors and corruption. Journaling file systems write out a special file called a journal, which keeps track of transactions to the disk. Updates to the disk are then committed atomically. If power is suddenly interrupted, a given set of updates will have either been fully committed to the file system, in which case there is not a problem, and the file system can be used immediately, or the given set of updates will be marked as not yet fully committed, in which case the file system driver can read the journal and fix inconsistencies that occurred.
Log-structured file systems are generalizations of journaling file systems. In a log-structured file system, both file system metadata and file system data are journaled. This design allows for access to old versions of files whereas traditional journaling file systems may lose or corrupt file content due to disk errors.
Log structured file systems have a write out of place property, according to which, whenever a data block is modified, the modified data block is written to a new physical location on disk. Some log-structured file systems have a write anywhere property in which data and/or metadata do not have to be written to any particular location on disk. That is, the file system can write new data or metadata to any unallocated block on any available disk. This property makes it possible to perform random writes very efficiently.
This property, however, may also increase the likelihood that files written to disk will become fragmented over time. The likelihood of fragmentation is especially high for files subject to random write workloads, such as long-lived files used to store database data, since the random workloads cause the file data to be spread out over the disks. As fragmentation increases, sequential reads may become slower. Additionally, random reads may suffer from higher seek latencies since the data may spans across many cylinders of a disk drive, rather than being localized to a smaller set of cylinders, for example.
To alleviate a possible file fragmentation problem in write anywhere file systems, techniques have relied upon reallocating file data to lay out the file data continuously. In certain techniques, this reallocation is done as new file data is written. In other techniques, this reallocation is done as part of a disk defragmentation operation in which multiple fragmented files across a disk are reallocated. Reallocation techniques generate a large load on the system, however, since data is continuously read and written. Additionally, during a disk defragmentation operation, a file system's ability to access the disk to service regular functions is hindered. Therefore, disk defragmentation operations are often performed during system maintenance downtime.