1. Field of the Invention
The present invention relates to a system for updating data, reading data, and handling storage device and adaptor failures in a shared disk system.
2. Description of the Related Art
In Redundant Arrays of Independent Disks (RAID) systems, data files and related parity are striped across multiple disk drives. In storage subsystems which manage numerous hard disk drives as a single direct access storage device (DASD), the RAID logic is implemented in the controller of the subsystem. RAID storage methodologies have also been implemented in software for execution on a single host computer. This allows the single host computer, such as a personal computer, to implement RAID storage techniques on local hard disk drive space. Such software RAID methodologies are described in xe2x80x9cAlgorithms for Software and Low Cost Hardware RAIDS,xe2x80x9d by Jai Menon, Jeff Riegel, and Jim Wyllie, document no. 1063-6390 (IEEE 1995), which is incorporated herein by reference in its entirety.
One problem with the single storage subsystem is the risk of failure. Techniques have been developed to improve failback and recovery in case of failures in the hardware controller. One such failback technique is the Fast Write Technique which provides two separate controllers on different power boundaries that control the flow of data from host systems to DASDs. If one controller fails, the other controller can continue writing data to the DASD. Typically a non-volatile storage unit (NVS) is included with each separate controller, such that each NVS connected to a controller backs up the data the other controller is writing to DASD. Such failback systems employing the two-controller failsafe structure are described in U.S. Pat. Nos. 5,636,359, 5,437,022, 5,640,530, and 4,916,605, all of which are assigned to International Business Machines, Corporation (IBM), the assignee of the subject application, and all of which are incorporated herein by reference in their entirety.
RAID systems can also be implemented in a parallel computing architecture in which there is no central controller. Instead, a plurality of independent controllers that control local hard disk storage devices are separate nodes that function together in parallel to implement RAID storage methodologies across the combined storage space managed by each node. The nodes are connected via a network. Parity calculations can be made at each node, and not centrally. Such parallel RAID architecture is described in xe2x80x9cThe TickerTAIP Parallel RAID Architecture,xe2x80x9d by Pei Cao, Swee Boon Lim, Shivakumar Venkatarman, and John Wilkes, published in ACM Transactions on Computer Systems, Vol. 12, No. 3, pgs. 236-269 (August, 1994), which is incorporated herein by reference in its entirety.
One challenge in shared disk systems implementing a parallel, shared disk RAID architecture is to provide a system for insuring that data is properly updated to disks in the system, that a write or update request invalidates stale data so such stale data is not returned, and that a read request returns the most current data.
To overcome the limitations in the prior art described above, preferred embodiments of the present invention disclose a system for updating data at a data block. A first processing unit receives update data. The data block to update is located in a first storage device and a second storage device stores parity data for the data block. A parity group comprises a data block and corresponding parity data for the data block. The first processing unit obtains the data at the data block and calculates partial parity data from the data at the data block and the update data. The first processing unit stores the partial parity data in a storage area and writes the update data to the data block in the first storage device. The first processing unit further updates parity data for parity groups for which partial parity data is maintained by obtaining control of access to the parity group to update from a second processing unit if the first processing unit does not control access to the parity group. When the first processing unit controls access to the parity group, the first processing unit calculates new parity data from the partial parity data and the parity data in the second storage device, and writes the new parity data to the second storage device.
Further embodiments concern processing a request to read data. A first processing unit receives a request to read a data block in a storage device from a requestor. The first processing unit returns the data from a first cache after determining that the requested data is in the first cache. The first processing unit requests permission from a second processing unit to transfer the data in a second cache to the first cache after determining that the data is in the second cache. The first processing unit transfers the data from the second cache to the first cache and returns the data to the requestor after receiving permission from the second processing unit. After receiving a message from the second processing unit denying permission, the first processing unit reads the data block in the first storage device and returns the read data to the requester.
Preferred embodiments of message exchanging insure that the first processing unit does not provide data in a read cache that is stale in view of data updates performed by the second processing unit. Moreover, with the preferred embodiments, access to data blocks is controlled. Controlling access helps insure that parity updates are properly handled, data in memory locations is invalidated so that stale or outdated data is not returned to a later read request, stale data is not destaged to a storage device, and a read request returns the latest version of the data block.