Large-scale data storage systems today typically include an array of disk drives and one or more dedicated computers and software systems to manage data. A primary concern of such data storage systems is that of data corruption and recovery. Data corruption occurs where the data storage system returns erroneous data and has no indication that the data is wrong.
One example of data corruption results from a phantom write error. A phantom write error occurs when a data storage system reports that a write command has been executed but fails to execute it. Such data corruption may occur for several reasons. For example, a write cache error can cause a reported write to never reach a disk medium. In another example, a block of data can be written to the wrong address due to a malfunction of a component of the data storage system. This results in corruption of data at two locations: it leaves stale data at the original location and replaces valid data at the new location with data that is not expected to be stored at this new location. The first of these two errors is a phantom write error.
Errors like phantom write errors are “silent”, i.e., the data storage system does not realize that the error has occurred. Silent data corruption is particularly problematic. For example, when an application requests data and gets the wrong data this may cause the application to crash. Additionally, the application may pass along the corrupted data to other applications. If left undetected, these errors may have disastrous consequences (e.g., irreparable undetected long-term data corruption).
The problem of detecting silent data corruption is addressed by creating integrity metadata such as a checksum for each data block. A checksum is a numerical value derived through a mathematical computation on the data in a data block. When data is stored, a numerical value is computed and associated with the stored data. When the data is subsequently read, the same computation is applied to the data. If an identical checksum results, then the data is assumed to be uncorrupted.
A phantom write error occurs when the data storage system fails to write the entire block of data to the requested location, leaving data at the requested location unchanged, as well as a corresponding checksum stored with the data. Accordingly, a checksum cannot be used to detect a phantom write error unless the checksum is stored separately from the data. However, such separated metadata would create a significant additional expense. Specifically, each read command would require at least two physical I/O operations (i.e., a data read and a metadata read) and each write command would require at least four physical I/O operations (i.e., a data write and three operations to update the metadata including read, modify and write). These read/modify/write operations are required because integrity metadata is typically much smaller than a data block, and typical storage systems today only perform I/O operations in integral numbers of data blocks. If the data storage system contains redundant arrays of disk drives under RAID (redundant arrays of inexpensive disks) 1 or RAID 5 architectures, these additional operations can translate into many extra disk I/O operations.
The problem with the additional I/O operations can be ameliorated by caching the integrity metadata in memory of the data storage system. However, the integrity metadata is typically 1–5 percent of the size of the data. For example, typical storage systems using block-based protocols (e.g., SCSI) store data in blocks of 512 bytes in length. Such data blocks would require 4–20 bytes of metadata for each data block (i.e., 10–50 MB of metadata for 1 GB of user data). Thus, it is not practical to keep all of the integrity metadata in memory. Furthermore, even if it were possible to store the metadata in memory, metadata updates would need to be stored in a non-volatile storage device and would, therefore, require either additional disk I/O operations or non-volatile memory of the substantial size.