1. Field of the Invention
This invention relates to data storage subsystems, and more particularly, to a DMA controller with integrated XOR parity computation capability adapted to compute parity in parallel with the transfer of data segments.
2. Discussion of Related Art
Redundant Arrays of Inexpensive Disks (RAID) systems are disk array storage systems designed to provide large amounts of data storage capacity, data redundancy for reliability, and fast access to stored data. RAID provides data redundancy to recover data from a failed disk drive and thereby improve reliability of the array. Although the disk array includes a plurality of disks, to the user the disk array is mapped by RAID management techniques within the storage subsystem to appear as one large, fast, reliable disk.
There are several different methods to implement RAID. RAID level 1 mirrors the stored data on two or more disks to assure reliable recovery of the data. Other common implementations of RAID, levels 3, 4, and 5 distribute data across the disks in the array and provide for a block (or multiple blocks) of redundancy information (e.g., parity) that is also distributed over the disk drives. On each disk, data is mapped and stored in predefined blocks generally having fixed size. A predefined number of blocks of data and redundancy information (e.g., parity), from each disk of the array, are mapped to define a stripe of data. One common type of stripe, the parallel stripe, provides load balancing across the disks in the array by defining the stripe as parallel blocks of data across the disk array.
In RAID level 3, and 4, the redundant information, that is parity information, is stored in a dedicated parity disk. In a RAID level 5 implementation, the parity information is interleaved across all the disks in the array as a part of the stripe.
RAID levels 3, 4, and 5 suffer I/O performance degradation due to the number of additional read and write operations required in data redundancy algorithms. RAID controllers often include local memory subsystems (e.g. cache) used to temporarily store data and parity involved in a host I/O operation and thereby mitigate the performance degradation of the redundancy techniques.
There are two common write methods implemented to write new data and associated new parity to the disk array. The two methods are the Full Stripe Write method and the Read-Modify-Write method also known as a partial stripe write method. If a write request indicates that only a portion of the data blocks in any stripe are to be updated then the Read-Modify-Write method is generally used to write the new data and to update the parity block of the associated stripe. The Read-Modify-Write method involves the steps of: 1) reading into local memory old data from the stripe corresponding to the blocks to be updated by operation of the write request, 2) reading into local memory the old parity data for the stripe, 3) performing an appropriate redundancy computation (e.g., a bit-wise Exclusive-Or (XOR) operation to generate parity) using the old data, old parity data, and the new data, to generate a new parity data block, and 4) writing the new data and the new parity data block to the proper data locations in the stripe.
If all the blocks in a stripe are available in the local memory or provided in the write request, then a Full Stripe Write is possible. In a Full Stripe Write, the parity computation is a XOR of all the data blocks within a stripe. The Full Stripe Write avoids the need to use old parity data during the new parity computation. Full Stripe Write improves I/O performance because a memory access is not required to read the old parity data from disk and to place a copy of the old parity in local memory.
It is known to use a DMA circuit in a RAID controller to transfer data from a source to a destination. Exemplary of such a DMA transfer is the exchange of data between a host system memory and the RAID controller local memory (e.g., cache or other buffers). A request is made to the DMA circuit to perform a data transfer. The DMA controller establishes a direct data path between the host RAM and the local memory (e.g., cache). Thus, the DMA allows the RAID controller central processing unit (CPU) to perform other tasks while the data exchange occurs in parallel. In the case of a write operation from the host to the RAID subsystem, the RAID controller CPU reads the data from local memory and computes required parity as noted above. The disk drive controller is programmed to transfer the data and new parity from the RAID subsystem local memory to the disk array.
The local memory is therefore accessed a number of times for each such complete write operation. First, the local memory is written with the data transferred from the host. Second, the same data is read again to compute the parity data, and finally the data is read again to write the data and associated parity to the disk array. Each of these local memory accesses utilizes valuable memory bandwidth in the RAID controller. It is desirable to reduce the utilization of the local memory bandwidth utilized for each write operation so as to improve the overall I/O performance of the RAID subsystem.
Some prior techniques and devices have integrated parity computation circuits with the DMA controller to simplify or speed the computation of XOR parity data. Such known techniques tend to integrate the XOR computation with the DMA controller such that the computations is performed at the "back-end" of the RAID controller data transfers. In other words, the DMA controller performs the XOR parity computation as the data is transferred from the RAID controller local memory to the disk array. In such methods, the DMA controller reads the stripes of data to be written from RAID subsystem local memory and simultaneously computes the parity of the stripe as it transfers data to the disk array.
Back-end parity computations generally require that the disk drives be operable in a synchronized manner such that the parity computation and DMA transfer operate in "lock-step" among a plurality of disk drive transfer operations. Parity is computed using related portions (segments) of the stripe. The XOR computation circuits must therefore receive the proper sequence of related bytes in related segments to compute a correct XOR parity segment for the related segments.
Such "lock-step" operation is used in older technology disk drives such as integrated drive electronics (IDE) interface devices because the RAID controller is more directly controlling the data transfer. IDE drives run single threaded in that each data transfer requires a handshake. Each transfer of data (e.g., byte or 16-bit word) requires a request to the RAID controller and acknowledgment of the data delivery by the disk drive controller before the next unit of data is transferred.
To accommodate this precision timed lock-step approach, a high speed static RAM (SRAM) buffer is commonly used in conjunction with the DMA transfer to assure readiness of the data when the DMA is requested to transfer the next unit of data to the disk drives. Not only is such an additional SRAM buffer somewhat costly, but it requires that the local memory data be read once again to transfer the data block from the lower speed local memory to the high speed SRAM transfer buffer.
Such back-end DMA/parity computations are not well suited to today's RAID systems that utilize disk drive devices having substantial buffering and intelligence within the drive device, for example a SCSI disk drive. The use of the SCSI drive device allows the SCSI controller to control the data transfer. The SCSI controller takes control of the bus and issues commands to transfer data from local memory (e.g. cache), rather than the CPU utilizing the DMA to transfer data to the disk drive. Higher performance SCSI disk drives typically contain significant buffering and computational intelligence to optimally order a plurality of commands queued within the drive itself (in a buffer local to the drive). For example, some SCSI disk drives have the computational intelligence for command queuing and elevator sorting. Such optimizations are often key to achieving the specified performance levels of the disk drives. SCSI controllers optimize performance by sorting I/O requests before saving data or before retrieving data. Therefore, the order the I/O request was received does not matter because the SCSI controller will sort the I/O request to optimize data retrieval and data storage to disk.
These optimization features are defeated by these lock-step sequences as required by the known back-end DMA/Parity techniques. In these cases, the substantial buffering within the drive device is not effectively utilized because the parity computation may be corrupted if the related segments are not transferred in the proper sequence. For example, one of the plurality of SCSI disk drives relating to a particular stripe may determine for any of several reasons that the buffer cannot handle further data at this time or a SCSI drive may chose to resequence operations in its buffer to optimize drive operations. Such a determination by one drive may require logic to stop the DMA/Parity operations to all drives so as to assure proper sequencing of the stripe data through the XOR circuits. Such additional logic to assure lock-step sequencing of all drives in a stripe serves to defeat the intelligence and buffering of high speed drives thereby negatively impacting overall subsystem performance.
It is evident from the above discussion that a need exists for enhanced DMA/Parity circuits which overlap parity computation with data transfer while reducing bandwidth requirements for local memory without substantially increasing hardware costs.