The invention relates generally to the field of computer systems and more particularly to systems that employ disk storage based upon a redundant array of independent disks (RAID) implementation.
A computer system includes an operating system whose primary function is the management of hardware and software resources in the computer system. The operating system handles input/output (I/O) requests from software processes or applications to exchange data with on-line external storage devices in a storage subsystem. The operating system (such as Windows NT(copyright) available from Microsoft, Corp. of Redmond, Wash.) forwards I/O requests to an I/O subsystem, which, in turn, converts the logical addresses into physical locations in the storage devices and commands the latter devices to engage in the requested storage or retrieval operations.
The on-line storage devices on a computer are configured from one or more disks into logical units of storage space referred to herein as xe2x80x9ccontainers.xe2x80x9d Examples of containers include volume sets, stripe sets, mirror sets, and various Redundant Array of Independent Disk (RAID) implementations. A volume set comprises one or more physical partitions, i.e., collections of blocks of contiguous space on disks, and is composed of space on one or more disks. Data is stored in a volume set by filling all of the volume""s partitions in one disk drive before using volume partitions in another disk drive. A stripe set is a series of partitions on multiple disks, one partition per disk, that is combined into a single logical volume. Data stored in a stripe set is evenly distributed among the disk drives in the stripe set. In its basic configuration, a stripe set is also known as a xe2x80x9cRAID 0xe2x80x9d configuration. A mirror set is composed of volumes on multiple disks, whereby a volume on one disk is a duplicate copy of an equal sized volume on another disk in order to provide data redundancy. A basic configuration for a mirror set is known as xe2x80x9cRAID 1.xe2x80x9d There is often a desire to increase data reliability in a stripe set by using parity distributed across storage blocks with respect to each stripe. Where such parity is provided to the stripe set, the configuration is known as xe2x80x9cRAID 5.xe2x80x9d In an even more complex implementation, where stripe sets are mirrored on a plurality of containers-and redundant data is distributed across the stripes, the resulting configuration is known as xe2x80x9cRAID 10.xe2x80x9d Generally speaking, all configurations of the RAID implementation (RAID 0-10) provide a collection of partitions, where each partition is composed of space from one disk in order to support data redundancy.
According to a prior system, the I/O subsystem configures the containers through a software entity called a xe2x80x9ccontainer manager.xe2x80x9d Essentially the container manager sets up a mapping structure to efficiently map logical addresses received from the operating system to physical addresses on storage devices. The I/O subsystem also includes a software driver for each type of container configuration on the system. These drivers use the mapping structure to derive the physical addresses, which they then pass to the prospective storage devices for storage and retrieval operations.
Speed of data transfer and storage is an important aspect of RAID storage arrangement. Enhancing speed, where possible is highly desirable. Typically, read data from the disk is cached in a large cache memory and transferred into and out of the cache for subsequent delivery to the host processor using a direct memory access (DMA) engine. Likewise write data from the host is first cached in the cache by the DMA engine for eventual delivery to the disk. Parity information for disk-stored data is generally maintained in appropriate blocks in the disk array in accordance with the particular RAID configuration. Parity is read by the DMA and combined using an XOR function with read data to perform an error check. Likewise, new parity is generated by the XOR process and rewritten to the appropriate parity block whenever an associated data block is written-to.
In an exemplary RAID 5 configuration, when data is read from the disk array, either across sequential blocks, or from random blocks, speed is relatively quick since existing stored data and parity are simply accessed, cached and read. Likewise a xe2x80x9csequentialxe2x80x9d write to a group of contiguous blocks and concurrent rewriting of overall parity is a relatively quick procedure. However, the writing of to a, random single block or blocks within the array can prove very time-consuming. As described in detail below, there are several read and write steps involving old and new parity for the rewritten data block. This multiplicity of steps in performing a random write significantly slows the data transfer process. In addition, this multi-step process occurs even if the new data block and associated parity are unchanged from the corresponding originals. Hence the same parity is written to the parity block as that originally stored, resulting in a redundant read and write process therefor.
Accordingly, it is an object of this invention to provide a more efficient system and method for detecting unchanged parity in a random disk write process, and thereby avoiding redundant parity-handling steps, particularly applicable, but not limited to a RAID 5 stripe set.
This invention overcomes the disadvantages of the prior art by providing a system and method that enables greater efficiency in the performance of write operations to a disk array, particularly where the disk array is configured as a RAID level 5 implementation having stripes of data and distributed parity blocks and the write is a xe2x80x9crandomxe2x80x9d write to a discrete, non-sequential storage block within the array.
In a preferred embodiment, a direct memory access and exclusive-OR (DMA/XOR) engine is resident on a bus structure between a host processor system and the disk array, which is typically configured as a RAID 5. The DMA engine can comprise a state-machine having various combinatorial logic functions. A cache memory is also resident on the bus and is adapted to cache write data written from the host and read from the disk array under control of a cache manager prior to storage thereof in the disk array.
When a random write operation is instructed to a predetermined location in the disk array, the new block of data is cached and the original block of data is read from the disk and also cached. The original parity block, associated with the original block of data and distributed within the disk array at a predetermined storage location is also read and cached. The cached original and new blocks of data are combined using the XOR function to derive a first result. The first result is then combined by the XOR function with the original distributed parity block to derive a data difference. A detection function determines whether the data difference is zero or non-zero. If zero, the new block is unchanged relative to the original block and no write of the new data block or any associated new parity to the disk array occurs. Conversely, if the data difference is non-zero, indicating a change, then the new block is written over the old block in the disk array and, likewise, the data difference is overwritten as the new parity. In this manner, the additional write steps are avoided when they would prove redundant.