The invention relates generally to the field of computer systems and more particularly to systems that employ a redundant array of independent disks (RAID) architecture.
A computer system includes an operating system whose primary function is the management of hardware and software resources in the computer system. The operating system handles input/output (I/O) requests from software processes or applications to exchange data with on-line external storage devices in a storage subsystem. The applications address those storage devices in terms of the names of files, which contain the information to be sent to or retrieved from them. A file system may be present to translate the file names into logical addresses in the storage subsystem. The file system forwards the I/O requests to an I/O subsystem, which, in turn, converts the logical addresses into physical locations in the storage devices and commands the latter devices to engage in the requested storage or retrieval operations. The file system can be part of the Windows NT(copyright) Operating System available from Microsoft, Corp. of Redmond, Wash., and is termed NT File System (NTFS).
The on-line storage devices on a computer are configured from one or more disks into logical units of storage space referred to herein as xe2x80x9ccontainers.xe2x80x9d Examples of containers include volume sets, stripe sets, mirror sets, and various Redundant Array of Independent Disk (RAID) implementations. A volume set comprises one or more physical partitions, i.e., collections of blocks of contiguous space on disks, and is composed of space on one or more disks. Data is stored in a volume set by filling all of the volume""s partitions in one disk drive before using volume partitions in another disk drive. A stripe set is a series of partitions on multiple disks, one partition per disk, that is combined into a single logical volume. Data stored in a stripe set is evenly distributed among the disk drives in the stripe set. In its basic configuration, a stripe set is also known as a xe2x80x9cRAID 0xe2x80x9d configuration. A mirror set is composed of volumes on multiple disks, whereby a Volume on one disk is a duplicate copy of an equal sized volume on another disk in order to provide data redundancy. A basic configuration for a mirror set is known as xe2x80x9cRAID 1.xe2x80x9d There is often a desire to increase data reliability in a stripe set by using parity distributed across storage blocks with respect to each stripe. Where such parity is provided to the stripe set, the configuration is known as xe2x80x9cRAID 5.xe2x80x9d In an even more complex implementation, where stripe sets are mirrored on a plurality of containersxe2x80x94and parity is distributed across the stripes, the resulting configuration is known as xe2x80x9cRAID 10.xe2x80x9d Generally speaking, all configurations of the RAID implementation (RAID 0-10) provide a collection of partitions, where each partition is composed of space from one disk in order to support data redundancy.
According to a prior system, the I/O subsystem configures the containers through a software entity called a xe2x80x9ccontainer manager.xe2x80x9d Essentially the container manager sets up a mapping structure to efficiently map logical addresses received from the operating system to physical addresses on storage devices. The I/O subsystem also includes a software driver for each type of container configuration on the system. These drivers use the mapping structure to derive the physical addresses, which they then pass to the prospective storage devices for storage and retrieval operations.
Specifically, when the computer system is initially organized, the I/O subsystem""s container manager configures the containers and maintains the configuration tables in a container layer of the I/O subsystem. In accordance with a co-pending related U.S. patent application Ser. No. 08/964,304, entitled, File Array Storage Architecture by Richard Napolitano et al., the container layer of the I/O subsystem comprises a Device Switch Table, a Container Array, and a Partition Table. The teachings of this application are expressly incorporated herein by reference. The Device Switch table consists of entries; each of which ordinarily points to the entry point of a container driver that performs I/O operations on a particular type of container. The Container Array is a table of entries, each of which ordinarily points to data structures used by a container driver. There is a fixed one-to-one relationship between the Device Switch Table and the Container Array. The Partition Table contains partition structures copied from disk drives for each container on the system. Each Partition Table entry points to one physical disk drive and allows the container driver to access physical location in the on-line storage devices.
When the operating system process issues an I/O request, it translates it into an I/O request bound for a particular device. The operating system sends the I/O request which includes, inter alia, a block number for the first block of data requested by the application and also a pointer to a Device Switch Table entry which points to a container driver for the container where the requested data is stored. The container driver accesses the Container Array entry for pointers to the data structures used in that container and to Partition Table entries for that container. Based on the information in the data structures, the container driver also accesses Partition Table entries to obtain the starting physical locations of the container on the storage devices. Based on the structures pointed to by the Container Array entry and partition structures in the Partition Table, the container driver sends the I/O request to the appropriate disk drivers for access to the disk drives.
As noted, in a RAID 5 configuration, data blocks are organized in stripes across a set of disks with parity distributed among stripes. When a write to a block is desired (an I/O operation), first the old data in the block must be read, the parity must be read, and then the old data and parity must undergo an exclusive-or (XOR) logic operation. Next, the new data must be xe2x80x9cXORedxe2x80x9d with the result. Finally, the XORed result is written to the parity location and the new data is written to the data location. Clearly, many steps are undertaken to perform a single I/O to the disk arrangement simply to update the parity. On such technique for handling parity is described in U.S. Pat. No. 5,309,451, entitled Data and Parity Prefetching For Redundant Arrays of Disk Drives, by Eric S. Noya, et al, the teachings of which are expressly incorporated herein by reference.
Where multiple I/Os occur within the same stripe, it should be possible to provide a procedure for reducing time in the parity update process using improved caching techniques. Accordingly, it is an object of this invention to enable a particular technique for caching of parity in RAID 5 configuration that reduces the time involved in parity update during I/O request operations performed successively within a given stripe.
This invention overcomes the disadvantages of the prior art by providing a system and method for updating parity based upon locking and unlocking of a storage stripe in a RAID implementation in which the stripe includes a parity block (e.g. RAID 5). The stripe is locked to prevent colliding I/O operations from being performed thereto while a current I/O operation is underway with respect to the stripe. A parity buffer is maintained that is updated to include the current parity information for the stripe. The buffer is xe2x80x9cswappedxe2x80x9d with the parity buffer associated with a next waiting I/O operation request before the stripe is unlocked. The buffer continues to be swapped with further requests so long as another I/O operation request waits on the lock. When no further I/O operation request for the given stripe is detected, then the current parity buffer is written into the stripe parity block. The intervening swaps reduce the number of parity cache reads and writes to disk, increasing efficiency.
According to one embodiment a system and method for performing multiple I/O operations to a storage medium organized as a RAID 5 implementation with a stripe of data storage defined across a plurality of devices and a parity block associated with each stripe includes the selective locking and unlocking the stripe in response to a current I/O operation request thereto so that only one I/O operation can proceed within the stripe while the stripe is locked and until the stripe is unlocked. The procedure reads parity data derived from an XOR operation performed to first old data in the stripe and first old parity data of the stripe. This is stored in a first parity buffer associated with a previous I/O operation, and first new data, also associated with the previous I/O operation, is also XORed. A second parity buffer is associated with the current I/O operation. The first parity buffer is swapped with the second parity buffer before the stripe is to enable the current I/O operation to proceed. A second new data associated with the current I/O operation is then written to the stripe. The buffer swap procedure continues for each new I/O request until no more are waiting on the lock. Then parity data derived from an XOR operation performed to second old data in the stripe, second old parity data of the stripe and the second new data, all associated with the current I/O operation, are finally written to the stripe.