In general, a computer reads data from, and writes data to, a disk connected to the computer. A disk is capable of storing huge amounts of data. Typically, however, a disk cannot read or write data at, or near, the speed at which the computer communicates with the disk. Thus, the amount of time a disk needs to read or write data, referred to as disk access time, may slow the performance of the computer.
In one method to offset this limitation, the workload of a single disk is distributed across a cluster of disks. In this method, referred to as a disk array, the computer communicates with an input/output control processor which, in turn, communicates with the cluster of disks. To the computer, the disk array appears as a single disk. In general, a disk array having a particular capacity provides better performance, in other words, can better respond to the computer, than a single disk of the same capacity.
In another method to offset this limitation, a portion of system memory is used to “store” the most recent reads and writes from a computer to the disk. This method, referred to as caching, improves performance because a computer can access system memory much faster than a computer can access the disk. However, despite the improvements obtained with disk arrays and caching, disk access time still continues to slow the performance of a computer.
Taking a different approach to improving disk performance, Mendel Rosenblum and John K. Ousterhout introduced, in the article “The Design and Implementation of a Log-Structured File System,” ACM Transactions on Computer Systems, February 1992, a new disk storage structure called a log-structured file system. In a log-structured file system, the computer, or input/output control processor, writes data to the disk in a sequential structure, referred to as a log. In general, a log improves the write performance of a disk because a log eliminates the time needed for the disk to, for example, find the location in which previously stored data is located and overwrite the previously stored data with newly modified data. However, because the disk writes data to the end of the log, the disk needs free space available at the end of the log in which to write the newly modified data, or in which to write new data.
To resolve the problem of free space, Rosenblum and Ousterhout divided the log into segments. A segment can be rewritten when the live data in the segment has been copied out of the segment. A segment cleaner process packs the live data from various segments and rewrites the packed live data to, for example, the beginning of the log.
As noted, though, a log-based file structure improves the write performance of a disk, as opposed to the read performance of a disk. Thus, a log-based file structure improves a disk's performance when, among other things, most of the read requests from a computer, or input/output control processor, to a disk drive are found in memory-based cache. When most of the read requests are not found in memory-based cache, however, the performance of a disk configured as a log structure is no better than the performance of a disk configured as a more conventional structure in which, for example, the disk must first find the location in which previously stored data is located and then overwrite the previously stored data with newly modified data.