The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
As part of its operation, database management system (DBMS) may frequently write information to storage locations. However, under certain circumstances, due to an error, the writes to a storage location may not persist causing data loss. Such occurrence is referred herein as “lost writes.” Lost writes occur when a DBMS issues successful write commands but data fails to be persisted in storage. For example, in a multi-tier storage hierarchy, the upper tiers, such as a storage cache, may consist of volatile memory to increase the speed of storage operations. A database server may issue persistent disk write operation to an operating system, which in turn may pass the command to a storage controller. A storage controller may return a successful status for the operation but for optimization reasons may only write the data in its storage cache. Due to exigent circumstances such as power loss or faulty logic, the data from the cache may not be written into non-volatile storage disks. Thus, although the database server has received the successful status for the disk write operation, the data has not been changed on the disk, and the actual writes have been lost. Other optimizations to improve data write operation performance may similarly introduce a risk of lost writes. Since the DBMS has no knowledge of the lost write, DBMS may read and return the stale data from the disk.
The issue of lost writes may be further exacerbated when additional data changes are performed in a DBMS using the stale data that resulted from lost writes. As the DBMS receives numerous requests for data changes, some of these requests may use the existing data in the DBMS to generate the new data changes. For example, if the DBMS utilizes structured query language (SQL), a user may issue a single data manipulation language (DML) statement that retrieves data from the DBMS and then changes other data based on the retrieved data (ex. UPDATE or INSERT with SELECT subquery statements). The retrieved data may include the stale data that resulted from the lost writes, and thus, the other data would be changed using the stale data corrupting the other data. Therefore, the effects of lost writes may cascade in this fashion to other data in DBMS in a very short period of time. To avoid lost writes and its cascading effects, an efficient methodology is required to quickly identify any potential lost writes.