The present invention relates to computer systems and to file system implementations for computer operating systems and methods and apparatus used by file systems for controlling the order of operations, such as the order in which information is updated on secondary storage, to realize gains in performance.
Computer systems are composed of hardware and software. The hardware includes one or more processors, typically a central processing unit (CPU), main storage or memory, secondary storage, and other input/output (I/O) devices. The software includes an operating system and user (application) programs. The computer system executes user programs in the hardware under the control of the operating system. The operating system controls the operation of secondary storage devices and other I/O devices such as terminals through a set of software modules called device drivers.
In modern computer systems, secondary storage systems such as disks have become performance bottlenecks because processors have higher speeds than disks. Various methods have been used to minimize the impact of disk subsystems on overall system performance. For example, some disk controllers employ large random access memories as disk caches in order to reduce the number of slower disk accesses. Operating system device drivers use a variety of algorithms to schedule disk requests so that they can be serviced with minimum mechanical movement or delays within the disk hardware. Some file system implementations log their operations so that it is not critical to have all intermediate information updates applied immediately to secondary storage. See, for example, Mendel Rosenblum and John K. Ousterhout, "The Design and Implementation of a Log Structured File System," Proceedings of the 13th ACM Symposium on Operating System Principles (October 1991), and Robert Hagmann, "Reimplementing the Cedar File System using Logging and Group Commit," Proceedings of the 11th ACM Symposium on Operating Systems Principles (November 1987).
By way of background, three types of writes exist for writing information to disk storage, namely, Synchronous, Asynchronous, and Delayed writes. With a synchronous write, the computer system suspends execution of the program that caused the write to occur. When the write completes, the program is allowed to continue. With an asynchronous write, the computer system permits the program to continue, after enqueuing the request for writing with the device drivers that manage the operation of disks. In this case, the program can make further progress, even though the actual information to be written is not yet stored to disk. Delayed writing is a special type of asynchronous write, in which the execution of the program is allowed to continue without enqueuing the write request with the device drivers. In this case, the buffer in memory that is modified during the write is marked as needing to be written to disk, and the request is propagated to the device drivers by the operating system at a later time. Generally, the operating system ensures that the request propagates within a finite time interval. Asynchronous writes achieve a performance advantage over synchronous writes by decoupling the execution of processors from disk subsystems and allowing more overlap between them. Delayed writes improve the decoupling and serve to reduce the aggregate number of disk writes by allowing multiple modifications of the same buffer to be propagated to the disk with a single disk write.
Despite the performance advantage of using asynchronous and delayed writes over synchronous writes as described above, many file system implementations employ synchronous write operations for recording changes to file system structural (administrative) data. Synchronous writing is used so that the file system implementation can regulate the order in which structural changes appear on the disk. By controlling the order in which modifications of structural data are written to disk, a file system implementation achieves the capability to perform file system repairs in the event that a system crash occurs before a sequence of structural changes can complete and reach a self-consistent organization of file system structural information. The specific requirements for ordering updates of structural data vary according to file system implementation as described, for example, in M. Bach, "The Design of the UNIX Operating System," Prentice-Hall, Englewood Cliffs, 1986. An example of a utility for repairing file systems following a crash, the fsck program, is described in M. McKusick, W. Joy, S. Leffler, and S. Fabry, "Fsck--The UNIX File System Check Program," UNIX System Manager's Manual--4.3 BSD Virtual Vax-11 Version, USENIX, April 1986.
As described above, many file system implementations need to perform ordered disk writing for maintaining structural order and repairability and therefore they employ synchronous writes that maintain the order of disk writes. The use of synchronous writes, however, limits system performance since disks and other secondary storage devices are slower relative to processors and main memory. File system formats can be designed to minimize the number of distinct disk updates needed for accomplishing a consistent reorganization of structure. Alternative techniques for repairability, such as intent logging, provide the ability to recover from an incomplete sequence of disk modifications. Such alternatives, while being beneficial to performance, have proved overly burdensome due to loss of media or software compatibility. Accordingly, there is a need for an improved operating system that provides control of write ordering without the performance penalty of synchronous writing and without mandating special hardware, new media formats or other changes.