1. Field of Invention
This invention relates to data storage systems.
2. Related Art
Many computer applications need to store and retrieve information. Information can be stored on hard disks, floppy disks, CD-ROMs, semiconductor RAM memory and similar storage devices. Many of these storage systems are susceptible to data loss of various forms including disk failures. A solution to the problem of disk failure involves use of a RAID (redundant array of independent disks) system. RAID systems use multiple hard drives and space to store parity data generated from the data drives, either on a separate drive (known as the parity disk) or spread out among the multiple drives. The use of multiple hard drives makes it possible to replace faulty hard drives without going off-line; data contained on a drive can be rebuilt using the other data disks and the parity data. If a hard drive fails, a new hard drive can be inserted which the system is running by “hot-swapping” while on-line. The RAID can rebuild the data on the new disk using the other data disks and the parity data. The performance of a RAID system is improved by disk striping, which interleaves bytes or groups of bytes across multiple drives, so more than one disk is reading and writing simultaneously.
Another problem with storage devices is that they are susceptible to data corruption of various forms, including bit miswrites. While RAID allows a user to determine, using parity data, that there has been corruption of some data included in a stripe, the parity data does not include enough information to restore the corrupted data. More specifically, parity data does not allow a user to determine which data in the stripe has been corrupted; thus we do not know which data is trustworthy.
Checksums are another form of redundant data that can be written to individual disks. The combination of parity bits across the disks and checksums within each disk includes enough information, that the corrupted data can be restored in RAID and other redundant systems.
A second known problem is that disk drives in earlier data detection systems are formatted in a manner incompatible with a standard checksum system. More specifically, the disk drives do not have any space available to store checksum information.
A third known problem is that the prior art of storing checksums does not provide for recovery of lost writes, including writes that did not make it to a disk drive. In such systems, updates to the data and to the checksum occur in one I/O. Recovery in such systems may be incomplete if that particular I/O becomes “lost”.
Accordingly, it would be advantageous to provide an improved technique for the error checking and correction of data storage systems. This is achieved in an embodiment of the invention that is not subject to the drawbacks of the related art.