1. Description of the Prior Art
Presently known systems for recording binary data typically include a host computer for processing data, one or more disk drives and a controller interfacing between the computer and the disk drives. With some systems that process relatively large amounts of data, a plurality of drives are configured in an array for recording and storing data. A disk array generally consists of a number of disk drives and one or more redundant drives. Fault tolerant disk arrays are used that allow failure of one or more disk drives in an array without loss of data. Such types of disk arrays are used when data availability is critical. In one method, fault tolerance is achieved by calculating simple parity for the system data and storing it on the redundant drive(s). The reconstruction of data for a bad block requires the existence of parity data for that block, knowledge in the controller of the location of the bad block, and no more than one bad data block per parity block.
It is well known that the most common failure events which occur with the use of fault tolerant disk arrays are recoverable. Since fault tolerant disk arrays are employed for critical data applications, less common failure events affecting data integrity assume a much higher importance as compared to nonfault tolerant disk systems. Thus controller failures, even though much less likely than drive failures, may lead to data loss that is unacceptable. Although controller failure during a read operation leads to the lack of availability of system data, there is no permanent loss of the data. The faulty controller can be replaced and the data recovered. On the other hand, a controller failure during a write operation, if undetected, can cause a corruption of the system data on the drives. Such corruption of data would not be detected until the bad block is subsequently read. If there is a significant time lag between the write and read operations, additional bad blocks could be written in the interim. In a worst case situation, a controller failure during a write operation may never be detected in the controller. For example, if the controller scrambles data when writing to the disks and parity data is generated for the scrambled data, an undetectable and uncorrectable write failure would occur.
Typical controller architecture has two exposures to write data loss events, namely (1) an undetected failure of the 22 controller data path function; and (2) an undetected failure in one of the drive data paths. Even though the drive data paths are fault tolerant in a disk array, a bad block must be detectable in order for it to be correctable. Typical disk controllers protect against write data loss events by reading and checking the data just written. This form of "write verification" in disk controllers requires an additional revolution of the disks in order to perform the read operation, which constitutes a significant performance penalty and therefore is not generally attempted. Furthermore, guaranteed detection of a write failure in a drive data path is difficult to implement and rarely achieved in disk array systems.
One well known error detection approach uses cyclic redundancy check (CRC) characters for detection of data path errors. Presently known disk systems require CRC generation circuitry and separate additional independent circuitry for detecting data errors. This duplication of circuitry is relatively expensive. It is highly desirable to provide integrity checking in the controller during the write mode using simplified circuitry.