In many computer systems, the storage and retrieval of information for and by computer applications is handled by one or more central storage systems. For example, one type of storage system commonly used in personal computers is a file-folder-and-directory-based system, also termed a “file system.” Such file systems organize pluralities of files into hierarchies to create an abstraction of the physical organization of the storage medium used to store the files. Generally, such organization into a hierarchy occurs at the operating system level. The files stored generally include the file hierarchy itself (the “directory”) embodied in a special file maintained by the file system. This directory, in turn, maintains a list of entries corresponding to all of the other files in the directory and the nodal location of such files in the hierarchy (herein referred to as the folders).
The use of file system for central storage has several limitations. These may be overcome by using relational database technology as the underpinning of a central storage system. However, the use of relational database technology may introduce additional challenges in various aspects of the computer system.
One such challenge relates to the disk storage used by the computer system. Relational databases use various methods to recover from failures and guarantee transactional consistency. One group of methods for recovery includes the ARIES method (from “Algorithms for Recovery and Isolation Exploiting Semantics”) and related methods. The ARIES method was first described in: Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992, pp 94-162. ARIES and related recovery methods generally rely on Write-Ahead Logging (WAL) protocol.
The WAL protocol is a specific and defined set of implementation steps necessary to ensure data is stored and exchanged properly and can be recovered to a known state in the event of a failure. Just as a network contains a defined protocol to exchange data in a consistent and protected manner, so too does the WAL describe the protocol to protect data.
The WAL, as defined in the ARIES article, “asserts that the log records representing changes to some data must already be in stable storage before the changed data is allowed to replace the previous version of the data in nonvolatile storage. That is, the system is not allowed to write an updated page to the nonvolatile storage version of the page until at least the undo portions of the log records which describe the updates to the page have been written to stable storage.”
However, some ways of storing log records may not meet the requirements of the WAL protocol. For example, integrated drive electronic (IDE) drives may not meet the requirements of the WAL protocol. An IDE drive caches pages and does not guarantee that a page has safely made to disk. Thus the WAL protocol assertion that “the log records representing changes . . . must already be in stable storage before the changed data is allowed to replace the previous version of the data in nonvolatile storage” is not met by IDE drives. Because of this, at best some committed transactions may lose data and at worst results in an inconsistent database if the disk reorders the writes.
No guarantee regarding recovery using an ARIES-type recovery method can be made because this WAL protocol is not met. When the relational database issues a write of the log buffer, the IDE disk controller may return success without waiting for the written log contents to go to disk. Additionally, in using an IDE disk, there is no guarantee that the cache will be written out to disk in the order of the original writes.
Additionally, drives that are not battery-backed may cause torn pages, due to non-atomic writes. For example, if the database page size is 8 KB, but the disk being used does not guarantee the atomic write of an 8 KB page, a torn page may result. A power-failure in the middle of a disk write may result in a torn page, where some sectors of the page contain the previous images and others new images. Even if the relational database system can detect torn pages during recovery time, recovering the page requires restoring from backups which typically requires an administrator.
In view of the foregoing deficiencies in existing data storage and database technologies, there is a need for a recovery scheme that provides improved performance, for example when used with data storage devices which do not necessarily meet the WAL protocol. The present invention satisfies this need.