This application is related to the following co-pending and commonly-assigned patent applications, all of which are filed on the same date herewith, and all of which are incorporated herein by reference in their entirety:
xe2x80x9cDistributed Storage System Using Front-End And Back-End Locking,xe2x80x9d by Jai Menon, Divyesh Jadav, Kal Voruganti, Ser. No. 09/124,004;
xe2x80x9cSystem for Updating Data in a Multi-Adaptor Environment,xe2x80x9d by Jai Menon, Divyesh Jadav, Deepak Kenchammana-Hosekote, Ser. No. 09/128,574;
xe2x80x9cSystem For Changing The Parity Structure Of A Raid Array,xe2x80x9d by Jai Menon, Divyesh Jadav, Deepak Kenchammana-Hosekote, Ser. No. 09/129,012;
xe2x80x9cUpdating And Reading Data And Parity Blocks In A Shared Disk System,xe2x80x9d by Jai Menon, Ser. No. 09/129,067; and
xe2x80x9cUpdating and Reading Data and Parity Blocks in a Shared Disk System with Request Forwarding,xe2x80x9d by Jai Menon and Divyesh Jadav, Ser. No. 09/128,754.
1. Field of the Invention
The present invention relates to a system for updating data and parity data in a shared disk system.
2. Description of the Related Art
In Redundant Arrays of Independent Disks (RAID) systems, data files and related parity are striped across multiple disk drives. In storage subsystems which manage numerous hard disk drives as a single direct access storage device (DASD), the RAID logic is implemented in the controller of the subsystem. RAID storage methodologies have also been implemented in software for execution on a single host computer. This allows the single host computer, such as a personal computer, to implement RAID storage techniques on local hard disk drive space. Such software RAID methodologies are described in xe2x80x9cAlgorithms for Software and Low Cost Hardware RAIDs,xe2x80x9d by Jai Menon, Jeff Reigel, and Jim Wyllie, Document No. 1063-6390/95, pgs. 411-418 (IEEE 1995), which is incorporated herein by reference in its entirety.
One problem with the single storage subsystem is the risk of failure. Techniques have been developed to improve failback and recovery in case of failures in the hardware controller. One such failback technique is the Fast Write Technique which provides two separate controllers on different power boundaries that control the flow of data from host systems to DASDs. If one controller fails, the other controller can continue writing data to the DASD. Typically a non-volatile storage unit (NVS) is included with each separate controller, such that each NVS connected to a controller backs up the data the other controller is writing to DASD. Such failback systems employing the two-controller failsafe structure are described in U.S. Pat. Nos. 5,636,359, 5,437,022, 5,640,530, and 4,916,605, all of which are assigned to International Business Machines, Corporation (IBM), the assignee of the subject application, and all of which are incorporated herein by reference in their entirety.
RAID systems can also be implemented in a parallel computing architecture in which there is no central controller. Instead, a plurality of independent controllers that control local hard disk storage devices are separate nodes that flnction together in parallel to implement RAID storage methodologies across the combined storage space managed by each node. The nodes are connected via a network. Parity calculations can be made at each node, and not centrally. Such parallel RAID architecture is described in xe2x80x9cThe TickerTAIP Parallel RAID Architecture,xe2x80x9d by Pei Cao, Swee Boon Lim, Shivakumar Venkatarman, and John Wilkes, published in ACM Transactions on Computer Systems, Vol. 12, No. 3, pgs. 236-269 (August, 1994), which is incorporated herein by reference in its entirety.
One challenge in shared disk systems implementing a parallel, shared disk RAID architecture is to provide a system for insuring that data and parity data are properly updated to disks in the system. Another challenge is to accomplish this goal of insuring data consistency and at the same time reduce the time to recover failed disks, reduce recovery time if both a disk and adaptor fail, and reduce network message traffic when handling data and parity updates.
To provide an improved system for handling updates to data and parity in a shared disk system, preferred embodiments of the present invention disclose a system for updating data. A first processing unit receives a data update to a data block in a first storage device. Parity data for the data block is maintained in a second storage device. A parity group is comprised of the data block and the parity data. After determining that the first processing unit does not control access to the parity group including the data block to update, the first processing unit sends a message to a second processing unit controlling access to the parity group requesting control of access to the parity group The first processing unit determines new parity data from the data update, the data at the data block in the first storage device, and the parity data in the second storage device. The first processing unit then writes the data update to the data block in the first storage device and the new parity data to the second storage device.
In further embodiments, a parity group set indicates a plurality of parity groups. The first processing unit determines a parity group set including the parity group including the data block to update after receiving the data update. The first processing unit then determines whether a first data structure indicates that another data block in the parity group set is being updated. If so, the first processing unit sends a parity group set message to the second processing unit including information on the parity group set including the data block to be updated and a third data structure indicating parity groups recently updated.
Preferred embodiments provide systems and methods for updating data and parity groups and at the same time minimizing network message traffic between the processing units, e.g., adaptors, in the system. Further embodiments use messaging to keep the second adaptor informed of the parity groups being updated. In this way, if the first adaptor fails, the second adaptor can readily determine the inconsistent parity groups that need to be updated or block access to the inconsistent parity groups before recovering failed data. Preferred embodiments seek to balance the goals of reducing message traffic and improving failure recovery time.