Most industry strength transaction processing systems, including databases, use ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) for logging and recovery in order to guarantee ACID (Atomicity, Consistency, Isolation and Durability) properties of transactions and recover from crashes. ARIES supports partial rollbacks of transactions, fine-granularity (record-level) locking and recovery using write-ahead logging (WAL). The WAL protocol asserts that the log records representing changes to some data must already be on stable storage before the changed data is allowed to replace the previous version of that data on nonvolatile storage. That is, the system is not allowed to write an updated page to the nonvolatile storage version of the database until at least the undo portions of the log records which describe the updates to the page have been written to stable storage.
To enable the enforcement of this protocol, systems using the WAL method of recovery, typically store in every page a log sequence number (LSN) of the log record that describes the most recent update performed on that page. Before the page is written out, the system ensures that the log up to this LSN has been made durable. Most database systems use write-through write requests in order to guarantee that the log is synchronously written to stable storage before writing the data changes. SCSI drives that are predominantly used in enterprise server deployments of database systems, support write-through capability by means of the ForceUnitAccess (FUA) flag. ForceUnitAccess is, however, not supported by IDE drives. IDE drives have a controller cache where write requests are cached before they are written to the physical disk. In the absence of FUA, the write call returns to the user-mode process when the data still may be in the volatile disk controller cache and can potentially be lost in a crash.
The writes from the controller cache to the disk platter are not performed in the same order as the writes from the Operating System (OS) to the controller cache. As a result of the re-ordering, although, for example, a database system writes the log, waits for the write request to complete, before writing the data, the actual writes to the disk need not be in the same order. The log write from the database is cached in the controller cache and so is the data write. At a later point in time when the disk writes the data to the platter, it may very well write the data changes before the log writes.
If a system crash occurs and the data write has gone through, the log write can be lost in a crash. This results in violation of the WAL protocol. Violation of the WAL protocol can result in data inconsistency, loss of data and worse still loss of recoverability rendering the database unavailable. This problem is not limited to database systems alone. ARIES logging and recovery is used for other transactional systems, recoverable file systems, etc. The lack of write-through guarantees poses similar problems to these systems.