1. Field of the Invention
The present invention is directed to a method for correcting errors in data read from a disk drive array. More particularly, the present invention is directed to an auto-correction method utilizing a dual parity generation engine which transfers data from a cache memory to a stage buffer memory. Still further, the present invention takes advantage of the dual parity generation engine's two disk drive failure fault tolerance to deliberately map out the data from each of the disk drives of the array in a sequential manner as data is repetitively transferred between the cache memory and the stage buffer memory. Responsive to the Dual parity generation engine identifying valid data being obtained with the mapped-out disk drive being a known single drive fault, the mapped-out disk drive is identified as the disk drive in error. The valid data reconstructed by the dual parity generation engine and transferred to the stage buffer memory is subsequently transferred to the processor requesting the data to complete the read operation.
2. Background of the Invention
Computer systems often employ disk drive devices for storage and retrieval of large amounts of data. However, disk drive devices are subject to a number of possible failures that result in invalid data. Such failures can be the result of defects in the recording media, a failure in the mechanics of the disk drive mechanisms, or electrical component failures such as motors and servos, or a failure in the electronic devices which are part of the disk drive unit. To improve the reliability of disk drive storage, redundant arrays of disk drives have been utilized. Redundant arrays of inexpensive disks (RAID), also referred to as redundant arrays of independent disks have grown in usage. In the originally proposed five levels of RAID systems, RAID-5 systems has gained great popularity for use in local area networks and independent personal computer systems, such as media database systems. In RAID-5, data is interleaved by stripe units across the various disk drives of the array along with error correcting parity information. However, unlike RAID-3 wherein there is a dedicated parity disk, RAID-5 distributes parity across all of the disk drives in an interleaved fashion.
The parity data in a RAID-5 system provides the ability to correct data only for a failure of a single disk drive of the array. Data storage systems requiring a greater fault tolerance, utilize a later proposed RAID-6 system. In RAID-6, data is interleaved in stripe units distributed with parity information across all of the disk drives. To overcome the disadvantage of the RAID-5 inability to correct for a failure of more than one disk drive, the RAID-6 system utilizes a redundancy scheme that can recover from a failure of any two disk drives. The Raid-6 parity scheme typically utilize either a two-dimensional XOR algorithm or a Reed-Solomon Code in a P+Q redundancy scheme.
Even utilizing the RAID-6 architecture, such systems while having the ability to detect failures in up to two disk drives, cannot correct the data unless each disk drive in error is identified. Such is the case in the storage system architecture disclosed in U.S. Pat. No. 7,127,668, but modified with an additional parity drive for use with a dual parity engine. Without the ability to identify the disk storage channel in error, the more fault tolerant parity algorithm of the RAID-6 system is unable to provide corrected data to the requesting processor, and must therefore report a “read error” to the processor requesting the data. Thus, there is a need to provide a means for identifying the disk drive in error in such instances.