This invention relates to mass information storage systems used in high-performance computer systems. More particularly, this invention relates to a new and improved technique of using sequence number and revision number metadata for assuring high data integrity against data path errors or drive data corruption errors which may inadvertently occur during the transfer of data to, or the retrieval of data from storage media, such as a redundant array of independent disks (RAID) mass storage system.
In high-performance computer systems, a mass storage system must be capable of rapidly supplying information to each processor of the computer system in a xe2x80x9creadxe2x80x9d operation, rapidly transferring information for storage in a xe2x80x9cwritexe2x80x9d operation, and performing both read and write operations with a high level of integrity so that the information is not corrupted or lost. Incorrect, corrupted or lost information defeats or undermines the effectiveness of the computer system. The reliability of the information is an absolute necessity in most if not all business computing operations.
A variety of high-performance mass storage systems have been developed to assure rapid information storage and retrieval operations. The information storage and retrieval operations are generally the slowest operations performed by computer system, consequently the information storage and retrieval operations limit the speed and functionality of the computer system itself.
One popular mass storage system which offers relatively rapid information storage and retrieval capabilities at moderate cost, as well as the capability for assuring a relatively high integrity of the information against corruption and loss, is a redundant array of independent or inexpensive disks (RAID) mass storage system. In general, a RAID mass storage system utilizes a relatively large number of individual, inexpensive disk drives which are controlled separately and simultaneously. The information to be written is separated into smaller components and recorded simultaneously or nearly simultaneously on multiple ones of the disk drives. The information to be read is retrieved almost simultaneously in the smaller components from the multiplicity of disk drives and then assembled into a larger total collection of information requested. By separating the total information into smaller components, the time consumed to perform reading and writing operations is reduced. On the other hand, one inherent aspect of the complexity and speed of the read and write operations in a RAID mass storage system is an increasing risk of inadvertent information corruption and data loss arising from the number of disk drives and the number and complexity of the input/output (I/O) operations involved.
Various error correction and integrity-assuring software techniques have been developed to assure that inadvertent errors can be detected and that the corrupted information can be corrected. The importance of such integrity-assuring techniques increases with higher performance mass storage systems, because the complexity of the higher performance techniques usually involve an inherent increased risk of inadvertent errors. Some of these integrity-assuring techniques involve the use of separate software which is executed concurrently with the information storage and retrieval operations, to check and assure the integrity of the storage and retrieval operations. The use of such separate software imposes a performance penalty on the overall functionality of the computer system, because the concurrent execution of the integrity-assuring software consumes computer resources which could otherwise be utilized for processing, reading or writing the information. Another type of integrity-assuring technique involves attaching certain limited metadata to the data to be written, but then requiring a sequence of separate read and write operations involving both the new data and the old data. The number of I/O operations involved diminish performance of the computer system. Therefore, it is important that any integrity-assuring software impose only a small performance degradation on the computer system. Otherwise the advantages of the higher performance mass storage and computing system will be lost or diminished.
Although the integrity-assuring software techniques used in most mass storage systems are reliable, there are a few classes of hardware errors which seem to arise inadvertently and which are extremely difficult to detect or correct on a basis which does not impose a performance degradation. These types of errors seem prone to occur to the disk drives, almost inexplicably. One example of this type of an error involves the disk drive accepting information in a write request and acknowledging that the information has been correctly written, without actually writing the information to the storage media. Another example involves the disk drive returning information in response to a read request that is from an incorrect disk memory location. A further example involves the disk drive writing information to the wrong address location. These types of errors are known as xe2x80x9csilentxe2x80x9d errors, and are so designated because of the apparent, but nevertheless incorrect, accuracy of the operations performed.
The occurrence of silent errors is extremely rare. However, such errors must be detected and/or corrected in computer systems where absolute reliability of the information is required. Because of the extremely infrequent occurrence of such silent errors, it is not advantageous to concurrently operate any integrity-assuring software or technique that imposes a continuous and significant penalty of performance degradation on the normal, error-free operations of the computer system.
Apart from silent errors, there are other situations in which data and parity inconsistency are detected due to incomplete write operations, failed disk input/output (I/O) operations or other general firmware and hardware failures. In such circumstances, it is desirable to utilize a technique to make determinations of consistency in the data and parity. Parity is additional information that is stored along with the data that defines the data and allows for reconstruction of the data. By knowing either the correct data or the correct parity, it is possible to correctly regenerate the correct version of incorrect data or parity. While a variety of integrity-assuring software techniques are available to regenerate the correct data or the correct parity, it is desirable to avoid the performance degradation penalty by continually executing separate software to continuously check data and parity.
It is with respect to these and other background considerations that the present invention has evolved.
The present invention involves creating a sequence number and a revision number and storing the sequence number and revision number as metadata along with the data itself in a mass storage system, such as a RAID system. The invention also involves utilizing the sequence number and the revision number in an effective way which does not impose a significant performance degradation penalty on the computer system or the mass storage system to detect and correct silent errors and errors of data and parity inconsistency.
One aspect of the present invention pertains to a method of creating metadata from user data to detect errors arising from input/output (I/O) operations performed on information storage media contained in a mass storage system. The method involves creating at least two user data structures and a parity data structure. Each user data structure contains user data and metadata which describes the user data contained in that same user data structure. The parity data structure is associated with the two or more user data structures and contains metadata and parity information which describes separately and collectively the user data and metadata in each of the two or more user data structures. A sequence number and a revision number are included as part of the metadata in each user data structure and are also included in the parity data structure as correlations to the same information in each user data structure. The sequence number identifies a full stripe write I/O operation in which the information in the user data structures and the parity data structure was written. The revision number identifies a subsequent I/O operation, such as a read modify write I/O operation, in which the user data in one user data structure in a stripe is written apart from writing the user data in the other user data structures of the same stripe. The parity information in the parity data structure describes the parity of the collective user data in both of the user data structures.
Using the information recorded in the user data structures and the parity data structure, another aspect of the invention involves detecting errors arising from I/O operations. The sequence numbers are read from one user data structure and from the parity data structure during a subsequent I/O operation, and a determination is made of whether the sequence numbers match. If the sequence numbers do not match, the sequence number is read from one other user data structure written during the full stripe write operation, and a correct sequence number is determined as that sequence number which is equal to the two matching ones of the three sequence numbers. Another aspect of the invention involves correcting the detected errors in the user data structure by using the user data from each user data structure having the correct sequence number and the parity information from the parity data structure to construct the correct user data and metadata information for the user data structure which has the incorrect sequence number. An aspect of the invention also involves correcting detected errors in the parity data structure by using the user data from the user data structures having the correct sequence numbers to construct the correct metadata and parity information for the parity data structure. In these cases, the constructed correct information is written to the user data structure or the parity data structure before executing the subsequent I/O operation.
Silent errors arising from drive data and data path corruption of some of the user data written in a full stripe write operation are detected in the manner described when the corrupted user data information is read. By establishing the sequence number and using it to detect the portion of full stripe write operation which has been corrupted, and by using the parity information in the parity data structure, that corrupted portion of the information is corrected and replaced by the correct information derived from the metadata and user data of each other user data structure and the parity data structure of the full stripe write.
By using the revision number metadata recorded in the user data structures and the parity data structure, another aspect of the invention involves detecting errors arising from previous read modify write (RMW) operations, prior to executing a subsequent RMW or read operation. As a part of the subsequent RMW or read operation, the revision number is read from the user data structure to which the RMW operation is addressed and from the parity data structure. If the sequence numbers match, a determination then is made of whether the revision numbers from the user data structure and the parity data structure match. If the revisions numbers do not match, the revision number which is indicative of a later-occurring subsequent I/O operation is attributed as the correct revision number. Thereafter, before executing the subsequent RMW or read operation, the previously-occurring errors which have been detected by the mismatch of the revision numbers are corrected, in accordance with another aspect of the invention. If the revision number from the parity data structure is not the correct revision number, the correct metadata and parity information for the parity data structure is constructed from the user data and metadata of the user data structures of the full stripe. On the other hand, if the revision number read from one of the user data structures is less than the correct revision number, the correct user data for that user data structure is constructed from the user data read from each other user data structure and from the parity information read from the parity data structure of the full stripe. In these cases, the constructed correct information is written to the user data structure or the parity data structure before executing the subsequent RMW or read operation. Silent errors in data path and drive data corruption are thereby corrected in this manner. The revision number metadata may be a relatively small field, in which case it is necessary to account for the wrapping of the numbers in the small field when determining the correct later-occurring revision number.
The present invention is preferably implemented in a RAID mass storage system having a plurality of disk drives. The plurality of user data structures and the parity data structure of the full stripe write are preferably written on separate disk drives. The present invention may be implemented on different levels of RAID mass storage systems, including RAID 1 systems where entire mirroring of data occurs. In RAID 1 systems, either the sequence number or revision number are used, but both are not required.
By associating metadata including the sequence number and the revision number with all of the data stored and retrieved, the data is protected against data path and disk drive alteration on a continuous basis. The protection, detection and correction features are derived as an inherent aspect of a read command directed to the information in question. Because errors rarely occur (even though the errors are serious when they do occur) there is little or no performance penalty arising from continuously running a data-assuring program concurrently with the normal operation of the mass storage system. The error detection and correction aspects of the present invention are implemented as a part of the read operation itself, and any corrections are accomplished before any new information is written. The present invention provides an effective method for both detecting and correcting errors.
A more complete appreciation of the present invention and its scope, and the manner in which it achieves the above noted improvements, can be obtained by reference to the following detailed description of presently preferred embodiments of the invention taken in connection with the accompanying drawings, which are briefly summarized below, and the appended claims.