1. Field of the Invention
The present invention is directed toward a method for improving performance for multiple disk drives in computer systems, and more particularly to a method for performing write operations in a disk array utilizing parity data redundancy and recovery protection.
2. Description of the Related Art
Microprocessors and the computers which utilize them have become increasingly more powerful during the recent years. Currently available personal computers have capabilities in excess of the mainframe and minicomputers of ten years ago. Microprocessor data bus sizes of 32 bits are widely available whereas in the past 8 bits was conventional and 16 bits was common.
Personal computer systems have developed over the years and new uses are being discovered daily. The uses are varied and, as a result, have different requirements for various subsystems forming a complete computer system. With the increased performance of computer systems, it became apparent that mass storage subsystems, such as fixed disk drives, played an increasingly important role in the transfer of data to and from the computer system. In the past few years, a new trend in storage subsystems, referred to as a disk array subsystem, has emerged for improving data transfer performance, capacity and reliability. One reason for building a disk array subsystem is to create a logical device that has a very high data transfer rate. This may be accomplished by "ganging" multiple standard disk drives together and transferring data to or from these drives in parallel. Accordingly, data for a logical volume is stored "across" each of the disks comprising the disk array so that each disk holds a portion of the data comprising the volume. If n drives are ganged together, then the effective data transfer rate can be increased up to n times. This technique, known as striping, originated in the supercomputing environment where the transfer of large amounts of data to and from secondary storage is a frequent requirement. In striping, a sequential data block is broken into segments of a unit length, such as sector size, and sequential segments are written to sequential disk drives, not to sequential locations on a single disk drive. The unit length or amount of data that is stored "across" each disk is referred to as the stripe size. If the data block is longer than n unit lengths, the process repeats for the next sector location on the disk drives. With this approach, the n physical drives become a single logical device. This may be implemented either through software or hardware.
One technique that is used to provide for data protection and recovery in disk array subsystems is referred to as a parity scheme. In a parity scheme, data blocks being written to various drives within the array are used and a known EXCLUSIVE-OR (XOR) technique is used to create parity information which is written to a reserved or parity drive within the array. The advantage to this technique is that it may be used to minimize the amount of data storage dedicated to data redundancy and recovery purposes within the array. However, there are a number of disadvantages to the use of parity fault tolerance techniques.
One major disadvantage is that traditional operating systems perform many small writes to the disk subsystem which are often smaller than the stripe size of the disk array, referred to as partial stripe write operations. When this occurs, the performance of the disk subsystem is seriously impacted because the data currently on the disk must be read off of the disk in order to generate the new parity information. This results in extra revolutions of the disk drive and causes delays in servicing the request. In addition to the time required to perform the actual operations, it will be appreciated that a READ operation followed by a WRITE operation to the same sector on a disk results in the loss of one disk revolution, or approximately 16.5 milliseconds for certain types of hard disk drives.
Thus, computer write operations often call for data stored on a disk to first be read, modified by the process active on the host system, and written back to the same address on the data disk. This operation consists of a data disk READ, modification of the data, and a data disk WRITE to the same address. Where an entire disk stripe is being written to the array, the parity information may be generated directly from the data being written to the drive array, and therefore no extra read of the disk stripe is required. However, a problem occurs when the computer writes only a partial stripe to a disk within the array because the array controller does not have sufficient information to compute parity for the entire stripe. There are generally two techniques used to compute. parity information for partial stripe write operations. In the first technique, a partial stripe write to a data disk in an XOR parity fault tolerant system includes issuing a READ command in order to maintain the correct parity. The computer system first reads the parity information from the parity disk for the data disk sectors which are being updated and the old data values that are to be replaced from the data disk. The XOR parity information is then recalculated by the host or a local processor, or dedicated logic, by XORing the old data sectors to be replaced with the related parity sectors. This recovers the parity value without those data values. The new data values are XORed on to this recovered value to produce the new parity data. A WRITE command is then executed, writing the updated data to the data disks and the new parity information to the parity disk. It will be appreciated that this process requires two additional partial sector READ operations, one from the parity disk and one reading the old data, prior to the generation of the new XOR parity information. The new parity information and data are then written to locations which were just read. Consequently, data transfer performance suffers.
The second method requires reading the remainder of the data that is not to be repudiated for the stripe, despite the fact that it is not being replaced by the WRITE operation. Using the new data and the old data which has been retrieved, the new parity information may be determined for the entire stripe which is being updated. This process requires a READ operation of the data not to be replaced and a full stripe WRITE operation.
Therefore, partial stripe writes hurt system performance because either the remainder of the stripe that is not being written must be fetched or the existing parity information for the stripe must be read prior to the actual write of the information. Accordingly, there exists a need for an improved method for performing partial stripe disk WRITE operations in a parity fault tolerant disk array.
Background on file systems used in computer systems is deemed appropriate. Generally, a file system may use one of two techniques, either a "free list" or a bit map technique to describe the amount and location of free space on disk drive units. In a free list technique, a known location on the disk contains a pointer to a block, and in this block is a list of other free blocks, i.e. blocks on the disk that are unused by the file system. The very last pointer in this block points to a block with a list of other free blocks, thus forming a chain of blocks which contain information about free space in the file system. When a free list technique is used, it is difficult to determine if a respective block is free or unused because the entire free list must be traversed to determine if the block is free. In a bit map scheme, a portion of the disk includes a reserved area where one bit is allocated for every "allocation cluster," wherein an allocation cluster may include a number of allocated blocks. A respective bit is set when the corresponding cluster is free and is cleared when the corresponding cluster is not free. Therefore, in a bit map scheme one need only examine the respective bit associated with the desired cluster to determine if the cluster is free. Most operating systems including DOS, OS/2, Netware, and modern versions of UNIX use a bit map scheme. The classical or original UNIX operating system is an example of an operating system which uses a free list scheme.