1. Field of the Invention
The present method is directed toward a method for improving performance for multiple disk drives within a computer systems, more particularly, to a method for performing write operations in a disk array utilizing parity data redundancy and recovery protection.
2. Description of the Related Art
Microprocessors and the computers which utilize them have become increasingly more powerful during the recent years. Currently available personal computers have capabilities in excess of the mainframe and minicomputers of ten years ago. Microprocessor word sizes of 32 bits are widely available whereas in the past 8 bits was conventional and 16 bits was common.
Personal computer systems have developed over the years and new uses are being discovered daily. The uses are varied and, as a result, have different requirements for various subsystems forming a complete computer system. With the increased performance of computer systems, it became apparent that mass storage subsystems, such as fixed disk drives, played an increasingly important role in the transfer of data to and from the computer system. In the past few years, a new trend in storage subsystems has emerged for improving data transfer performance, capacity and reliability. This is generally known as a disk array subsystem. One reason for building a disk array subsystem is to create a logical device that has a very high data transfer rate. This may be accomplished by "ganging" multiple standard disk drives together and transferring data to or from these drives to the system memory. If n drives are ganged together, then the effective data transfer rate is increased up to n times. This technique, known as striping, originated in the supercomputing environment where the transfer of large amounts of data to and from secondary storage is a frequent requirement. In striping, a sequential data block is broken into segments of a unit length, such as sector size, and sequential segments are written to sequential disk drives, not to sequential locators on a single disk drive. These operations can be performed in parallel, thus effectively increasing the data transfer rate. If the data block is longer than N unit lengths, the process repeats for the next sector location on the disk drives. With this approach, the n physical drives become a single logical device and may be implemented either through software or hardware.
A second reason for building disk array subsystems is to provide for data protection and recovery. Two data protection and recovery techniques have generally been used to restore data in the event of a catastrophic drive failure. One technique is the mirrored drive. The mirrored drive technique provides for a redundant drive for every data drive within an array. A write to a disk array utilizing a mirrored drive fault tolerance technique will result in a write to the primary data disk and a write to the mirrored drive. This technique results in the minimum loss of performance in the disk array. However, the disadvantage to this technique is that it uses 50% of the available data storage for redundancy purposes. This results in a relatively high cost per available storage unit.
Another technique is the use of a parity scheme which reads data blocks being written to various drives within the array and uses a known exclusive-or (XOR) technique to create parity information which is written to a reserved or parity drive within the array. The advantage to this technique is that it may be used to minimize the amount of data storage dedicated to data redundancy and recovery purposes within the array. In an 8 drive array, the parity technique would call for one drive to be used for parity information; thus, 12.5% of total storage is dedicated to redundancy, as compared to the 50% using the mirrored drive technique. The use of the parity drive technique reduces the cost of storage.
However, there are a number of disadvantages to the use of parity fault tolerance techniques. Primary among them is the loss of performance within the disk array as the parity drive must be updated each time a data drive is updated or written to. The data must undergo the XOR process in order to write the updated parity information to the parity drive as well as writing the data to the data drives. This process may be partially alleviated by having the parity data also distributed, relieving the load on the dedicated parity disk. However, this would not reduce the number of overall data write operations. In addition to the time required to perform the actual operations, it will be appreciated that a READ operation followed by a WRITE operation to the same sector on a disk will result in the loss of one disk revolution, or approximately 16.5 milliseconds for hard disk drives according to the preferred embodiment.
The use of the host processor to perform XOR parity information generation requires that the drive data go from the drives to a transfer buffer in the host system, that the updated drive data be processed by the host processor to create the new XOR parity information, and that the parity information be transferred to the parity drive via a transfer buffer in host system memory. As a result, the host processor encounters a significant overhead in managing the generation of the XOR parity data. The use of a local processor within a disk array to create XOR parity information also encounters many of the same problems that a system processor would. The drive data must again go from the drives to a transfer buffer in local processor memory to allow the local processor to generate the XOR parity data and the parity data must be written to the parity drive via the transfer buffer in local memory.
The use of an intelligent disk array (IDA) controller having a dedicated XOR parity engine was disclosed in U.S. patent application Ser. No. 431,735, assigned to Compaq Computer Corporation, assignee of the present invention and in European Patent Office Publication No. 0427119, published Apr. 4, 1991, the counterpart to the U.S. application. The apparatus described therein utilized a disk array DMA channel composed of four individual subchannels. A dedicated XOR engine utilized one of the four subchannels, generating parity data on a word for word basis from up to four different transfer blocks. The XOR engine was also capable of writing the result to a specified drive or to a transfer buffer through the subchannel.
However, even this apparatus may suffer from performance loss when updating data already stored on disk. Computer operations often call for data stored on a disk to first be read, modified by the process active on the host system, and written back to the same address on the data disk. This process consists of a data disk READ, modification of the data, and a data disk WRITE to the same address. Where an entire disk stripe is being written to the array, the parity information may be generated directly from the data being written to the drive array. However, a problem occurs when the computer writes only a partial stripe to a disk within the array, as the array controller does not have sufficient information to compute parity for the entire stripe. There are two means of addressing this problem.
In the first technique, the process of updating a data disk in an XOR parity fault tolerant system includes issue of a READ command, which reads and transfers the requested data to an address in system memory via transfer buffers. The data is modified by the process running on the computer system and a WRITE command is issued to write the updated data back to the sectors from which it was read. However, in order to maintain parity fault tolerance, the computer system must first READ the parity information from the parity disk for the data disk sectors which are being updated and the old data values that are to be replaced from the data disk. The XOR parity information is then recalculated by the host or a local processor, or a dedicated XOR engine as in application Ser. No. 431,735 by OXRing the old data sectors to be replaced with the related parity sectors. This recovers the parity value without those data values. The new data values are OXRed on to this recovered value to produce the new parity data. A WRITE command is executed, writing the updated data to the data disks and the new parity information to the parity disk. It will be appreciated that this process requires two additional partial sector READ operations, one from the parity disk and one reading the old data, prior to the generation of the new XOR parity information. Consequently, data transfer performance suffers.
The second method requires reading the remainder of the data that is not to be repudiated for the stripe, despite the fact that it is not being replaced by the WRITE operation. Using the new data and the old data which has been retrieved, the new parity information may be determined for the entire stripe which is being updated. This process requires a READ operation of the data not to be replaced and a full strip WRITE operation to save the parity information.
A disk array utilizing parity fault tolerance previously had to perform one of the above techniques to manage partial stripe WRITE operations. It will be appreciated that in either instance, the time required to perform the READ operations to obtain old data, old parity or the remainder of the stripe data may significantly increase the time required to perform disk WRITE operations. This is generally in large part due to the fact that a sector which has just been read must in the next command be written. This results in the loss of an entire disk revolution, commonly 16.5 MS as noted above. This is true whether the XOR parity information is being generated by either a system processor, a local processor or a dedicated XOR parity engine as disclosed in U.S. application Ser. No. 431,735. Accordingly, there exists a need for improved method for performing partial stripe disk WRITE operations in a parity fault tolerant disk array.