Within the field of computing, many scenarios involve a storage set provided by a set of storage devices (e.g., an array of hard disk drives interoperating according to a Redundant Array of Inexpensive Disks (RAID) array), and that may be accessed by various devices and processes to store and retrieve various types of data. In many such scenarios, data stored in different portions of the storage set may have a relationship. As a first example, a first data set and a second data set stored in the storage set may reference each other, such as related records in a database system. As a second example, two or more identical versions of the data may be retained in order to provide various advantages. For example, two storage devices may store and provide access to the same data set, thereby effectively doubling the access rate to the data. Identical copies of the data may be also retained in order to protect the integrity of the data; e.g., if a first copy of the data is lost due to a failure, such as data corruption or a hardware fault (e.g., a hard drive crash), an identical second copy of the data set may be accessed and replicated to recover from the failure.
As a third such example, data may be associated in order to detect and/or safeguard against errors or unintended changes to the data. For example, an error in the reading or storing logic of the device, a buffer underrun or overrun, a flaw in the storage medium, or an external disruption (such as a cosmic ray) may occasionally cause an inadvertent change in the data stored on the storage medium or in the reading of data from the storage medium. Therefore, in many such scenarios, for respective portions of data stored on the storage devices, a verifier, such as a checksum, may be calculated and stored, and may be used to confirm that the contents of the data set have been validly stored to and/or read from the storage device. As one such example, in the context of storing a data set comprising a set of bits, an exclusive OR (XOR) operation may be applied to the bits, resulting in a one-bit checksum that may be stored and associated with this data set. When the data set is later read, another XOR operation may be applied thereto, and the result may be compared with the one-bit checksum. A change of any one bit results in a mismatch of these XOR computations, indicating that the data has been incorrectly stored, altered, or incorrectly read from the storage device. Many types of verifiers may be identified, which may vary in some features (e.g., ease of computation, a capability of identifying which bit of the data set has changed, and an error-correction capability whereby an incorrectly read portion of data may be corrected).
Various forms of data replication are often achieved through the use of a Redundant Array of Inexpensive Disks (RAID) arrays, such as a set of hard disk drives that are pooled together to achieve various aggregate properties, such as improved throughput and automatic data mirroring. As a first such example, in a RAID 1 array, a set of two or more hard disk drives of the same size store identical copies of the storage set, and any update to the storage set is identically propagated across all of the hard disk drives. The storage set therefore remains accessible in the event of hard disk drive failures, even multiple such failures, as long as even one hard disk drive remains functional and accessible. As a second such example, a RAID 4 array involves a set of two or more disks, where one disk is included in the array not to store user data, but to store verifiers of the data stored on the other disks. For example, for a RAID 4 array involving four disks each storing one terabyte of data, the capacity of the first three disks is pooled to form a three-terabyte storage space for user data, while the fourth disk is included in the array to hold verifiers for data sets stored on the first three disks (e.g., for every three 64-bit words respectively stored on the other three disks, the fourth disk includes a 64-bit verifier that verifies the integrity of the three 64-bit words). The RAID array controller comprises circuitry that is configured to implement the details of a selected RAID level for a provided set of hard disk drives (e.g., upon receiving a data set, automatically apportioning the data across the three user data disks, calculating the verifier of the data set, and storing the verifier on the fourth disk). The RAID techniques used may also enable additional protections or features; e.g., if any single storage device in a RAID 4 array fails, the data stored on the failed device may be entirely reconstructed through the use of the remaining storage devices.