1. Field of the Invention
The present invention relates to a method, system, and an article of manufacture for implementing an error detection scheme in storage data.
2. Description of the Related Art
A block addressable storage device is typically comprised of one or more disks, such as flexible disks, rigid disks, optical discs, and stores data in addressable groups referred to as blocks. The number of bytes of data contained in a single block is called the block length or block size. While the block length can be any number of bytes, storage device manufacturers often format the storage devices into blocks with a block length of 512 bytes. The storage devices can be reformatted into blocks of a different block length. Application programs that read and write data to the storage devices need assurance that data integrity is maintained as data is transferred between the storage device and application program.
Prior art storage devices include techniques for assuring data integrity. For instance, storage device controllers often utilize an error correcting code (ECC) algorithm to detect and possibly correct hardware related failures within the storage device. In addition to hardware errors, data integrity may be compromised by transport errors that occur during data transmission via Small Computer System Interface (SCSI) cables, storage adapter cards and storage device drivers. Failure to detect the transport errors, as well as disk error allows corrupt data to propagate. Undetected transport errors that occur within data are referred to as “silent data corruption.” Silent data corruption occurs when the application program retrieves data from the storage system (i.e. a disk read request) that is stale, altered or lost without being detected or corrected. Stale data is data that was written at an earlier time and is incorrectly returned in place of the more recent (lost) data. Altered data is data that is present but corrupted or changed and no longer correctly represents the original data. Finally, lost data is data that is lost and no longer available. The presence of such errors is of substantial concern for critical applications where the impact of undetected errors can be catastrophic.
In prior art, checksums have been used to detect errors in data. The checksum of a group of data items is stored or transmitted with the group of data items. The checksum value is calculated by treating the data items as numeric values. Checksums are widely used in network protocols, where a checksum generated from the bits of a message accompanies the message during transmission. For instance, many checksum algorithms perform an XOR of the bits in the message to generate the checksum. The receiving station then applies the same checksum algorithm (e.g. XOR) to the message and checks to make sure that the computed numerical value is the same as the checksum within the transmission. In view of the prevalence of silent data corruption, there is a need in the art to provide an improved checksum based technique to detect silent data corruption.