In data processing systems, there is a need to maintain the validity and reliability of user data stored on data storage subsystems used by the data processing system. A technique for adding to the reliability of stored user data is to compute and store parity data that provides a check on user data that has previously been written to or otherwise stored on storage media. A further improvement may be obtained by distributing the user data across multiple data storage devices in a Redundant Array of Independent Disks (RAID). Providing parity data and storing data redundantly permits data that has been corrupted to be corrected, at least within some error detection and error correction parameters. A RAID subsystem typically includes a disk array controller that is in turn connected to a number of data storage devices, the subsystem being connected or connectable to one or more external computers or networks. A complete description of RAID may be found in The RAID Book, a Source Book for Disk Array Technology, Forth Edition, edited by Paul Massiglia, and published by the RAID Advisory Board, St. Peter, Minn., Sep. 1, 1994, copyright 1994 RAID Advisory Board, Inc. The purpose of a RAID is to provide redundancy to data stored by the computer (user data) to the data storage devices by regenerating, by the disk array controller, the stored user data when individual blocks of user data are corrupted or lost.
For example, in an exemplary RAID configuration having five data storage devices, user data is stored in four data sectors, each of four of the five data storage devices storing a respective data sector. Additionally, parity data providing some level of user data regeneration potential is stored in a fifth parity sector on the fifth data storage device. The four data segments (or data sectors) and the parity segment (or parity sector) comprise a data stripe (or sector stripe) in the RAID.
When a data consistency checks are to be made according to conventional techniques, such known data consistency checking techniques will read all data and parity sectors associated with a given data stripe to perform operations on each byte of data in those sectors to determine whether parity data stored for that data stripe is consistent with the user data stored in the stripe. Upon detecting an inconsistency, the system administrator typically has two options. The first option is to report the detected inconsistency without correcting the inconsistency. The second option is to correct the inconsistency by performing auto-correction, which includes a number of operations to attempt to restore the user data based on the parity data in the data stripe. Techniques for detecting and correcting errors to restore data are known in the art and not described in detail here.
Prior art data consistency checking techniques are limited in that they typically have no means or only limited means for determining whether the user data is corrupt or whether the parity data is corrupt when an inconsistency is identified. This limitation may result in erroneous data being stored in a data stripe when a data consistency check with an auto-correction is utilized. To illustrate such an erroneous result, consider that in some prior art techniques, upon detecting such an inconsistency, only the corresponding parity data is altered to make it consistent with the user data. In such techniques, having no other information, an assumption is made that the parity data has become corrupted rather than the user data. As a result, if the user data was actually corrupt, but not the parity data, the user data typically remains corrupt, and parity data consistent with the corrupt user data is written to the data stripe. What is needed is a system and method and computer program that can differentiate between corrupted parity data and corrupted user data, such that corrupted parity data is appropriately corrected in view of uncorrupted user data, and such that corrupted user data is corrected in view of uncorrupted parity data.
Yet another problem with known data consistency check techniques is that they are typically very time-consuming, especially in large data storage systems. Such known techniques typically operate on a burdensome amount of data to determine if parity data stored for a particular data stripe is consistent with corresponding user data. To perform a data consistency check, such prior art techniques not only read each byte of parity data in a data stripe, but also must perform exclusive-OR (XOR) operations on each byte of user data in each data sector in the data stripe. (User data is typically stored in 512 byte blocks. Data storage devices can be formatted to handle a variety of different sectors sizes, depending on the manufacturer and model of the data storage devices. Typical values for formatted sector sizes have conventionally been 512 bytes, 520 bytes, 524 bytes, and 528 bytes.
To illustrate this time-consuming computational burden, consider a volume set that consists of a 64 KB strip size with 16 data storage devices, a 512 byte segment size. A volume set is a disk array object that most closely resembles a single logical disk when viewed by the operating environment in host computer. A 64 KB stripe size means that 128 segments (sectors) of a data stripe are distributed on each disk drive). To perform data consistency checking, such conventional techniques typically perform XOR operations on 1048576 bytes of data. Such a large number of operations consume valuable disk array controller processing and data storage resources.
In light of the above, there remains desirability and need for a system and method that significantly reduce the number of bytes of data that must be operated on in a data stripe to determine whether parity stored the data stripe is consistent with the user data stored on the data stripe.
Somewhat related to the problem of the burdensome amount of time that conventional data consistency check techniques typically require, is the problem that such traditional techniques are typically not flexible enough to provide different data checking granularity to the data consistency checking procedure. For example, if stored data is mission critical financial information where misplacement of a decimal point could be catastrophic, more stringent data checking techniques may be desired. However, if the stored data is streaming video data where he is acceptable to experience a number of corrupted bits in the stored data, less stringent data checking techniques may be desired. Unfortunately, known data consistency checking techniques, regardless of the type of data being operated on, typically perform operations on each byte of user data to determine if such data inconsistencies exist, regardless of the type of data being checked. Therefore, there remains a need for system, method, apparatus, and procedure for a system or system administrator interacting with the system to define the desired granularity of data consistency checking, such that both the amount of processing time required to perform such checking on data that does not require stringent data checking and the amount of data storage resources typically required by a controller to perform such data checking is substantially reduced.
In addition, it will be appreciated that there remains a need for a system and method that permit and facilitate selection of a data checking methodology from amongst multiple levels of data checking, and more particularly for programmable data checking methodology that allows either the system or a system administrator interacting with the system to select from amongst multiple levels of data checking in accordance with the criticality of the data and the tolerance for errors in the data.
There also remains a need for a system and method for determining which of the host data or the parity data is erroneous or corrupted when a data inconsistency is identified, and for a system and method that use metadata in extended disk sector formatting to improve the efficiency or data checking and error detecting, and that can regenerate corrupted user data when such corrupted user data is determined.