1. Field of the Invention
This invention relates to data storage in computer systems and, more particularly, to the storage of data blocks and associated-hash values in computer systems. The invention also relates to computer system failures and crash recovery.
2. Description of the Related Art
In many applications of computer systems, reliable storage and retrieval of data is essential. Additionally, in the event an erroneous or unintended change in data does occur, it is also often desirable in such applications that the change be detected. Much effort has therefore been devoted to developing mechanisms that provide more reliable storage and which can detect erroneous or unintended changes in data.
One prevalent form of non-volatile data storage is disk storage. Although disk drives are usually reliable, they occasionally return incorrect data for various reasons including failure to write a data block to the disk, writing a data block at the wrong location (or address) on the disk, or reading a data block from the wrong location of the disk. Other classes of errors may also be introduced due to faults in the interconnect, drive microcode, and drive buffers, among others. Such errors may not be caught by the disk drives"" internal error detection mechanisms, which are typically designed to detect bit errors within a data block rather than errors resulting from misplacing an entire data block.
It is therefore desirable to perform an independent verification that the data returned in response to a read from a particular data block location (or address) is in fact the same data as was previously written to that data block address. One way to perform such a check is to compute and store a hash value (or a checksum) when writing the data block to the disk storage and to verify that hash value when reading the data block. In general, a hash value is a code which is computed from (and is thus dependent upon) the data of a block. A change in the data block may be detected by storing a hash value computed from the data block before it is stored, storing the data block and hash, retrieving the hash value when the data is read, and recomputing a new hash value based on the retrieved data. The hash retrieved from storage and the recomputed hash are then compared. If the hash values do not match, then the block of data retrieved from storage does not match the data intended to be stored. This technique requires that the hash values that are computed before the data is written be stored elsewhere on the disk (or on another device) and separate from the data block, so that they can be independently retrieved later for verification.
Unfortunately, while this technique allows for the detection of changes in data blocks during normal operations of a system, separate storage of the hash values can lead to a possible inconsistency between a newly written block of data and its corresponding updated hash value. This inconsistency can result since one of either the data block or the hash value must be written to the disk first. If a system or disk failure occurs between the two writes, the hash value and actual data may be inconsistent, thus rendering the hashing mechanism suspect at the very time it is needed most.
The problems outlined above may in large part may be solved by various embodiments of a data storage system and method employing a write-ahead hash log as described below. In one embodiment, a data storage system includes a computer coupled to a non-volatile storage, such as a disk drive, through an interconnect. The computer may include a block cache for storing cached copies of data blocks, and a hash table that stores hash values corresponding to the data blocks. Prior to writing back a modified cache block to the non-volatile storage, a log recorder of the computer stores an updated hash value corresponding to the modified cache block within a write-ahead hash log, which may also be also contained in non-volatile storage.
In one particular implementation, the log recorder may create a log record including an updated hash value and an address corresponding to a modified cache block. The log recorder may additionally maintain a first pointer value indicative of log records that have been stored to the write-ahead hash log, and a second pointer value indicative of the most recent log record stored in the write-ahead hash log for which a corresponding modified cache block has been stored to the non-volatile storage. These pointer values may be stored in the write ahead hash log with the log records. Log records may be grouped into log blocks which are eventually written to the non-volatile storage as a group. After the log record containing the updated hash value has been successfully written to write-ahead hash log, a cache manager may initiate the write-back of the dirty data block to the non-volatile storage. Until a verification has been made to ensure the corresponding dirty data block has been successfully written back to non-voliatile storage, the old hash value for the data block may also be retained in the write-ahead hash log. If a system, network or disk failure occurs between the writing of the log record containing the updated hash value to the write-ahead hash log and the writing of the corresponding dirty data block to the non-volatile storage, the hash table may be rebuilt according to the records in the write-ahead hash log.