1. Field of the Invention
This invention relates to efficient management and storage of data in a RAID disk array device or a RAID disk array in a computing system. More particularly, this invention relates to optimization of invalidation of data and parity information in a prewrite area of a RAID disk array.
2. Description of the Background
In computing systems designed for large data processing and data storage applications, redundant storage devices are provided to enhance the integrity of data maintained on the system in the event of a failure of a storage device.
For example, RAID (Redundant Array of Independent Disks) technology, such as RAID-1, RAID-4, and RAID-5, utilizes an array of disk drives which contain data and parity information distributed across each disk in the array. The parity information is additional information stored on the disks and can be used to reconstruct data contained on any of the drives of the array in the event of a single drive failure. In this manner, these RAID disk arrays can improve the data integrity of the computing system by providing for data recovery despite the failure of a single disk drive. However, because of the redundancy of information stored in the device, these RAID devices have been characterized by slow processing times for a single logical xe2x80x9cwritexe2x80x9d of data to the RAID device.
RAID architectures can include a RAID device which is a standalone self-contained storage unit having multiple disk drives included therein arranged in a RAID array. The RAID information processing is performed internally to the device and is transparent to the computing system attached thereto. Alternatively, a computing system may have an array of disks and perform the RAID information processing within the processor of the computing system. Throughout this application, these architectures are referred to interchangeably, and the terms RAID device and RAID disk array are used interchangeably.
Regardless of the RAID architecture employed, data and parity information must be synchronously maintained in order to prevent corruption of data. There is a chance that parity and data for a region of a disk may get out of synchronization due to a system failure or crash. When this happens there is no indication of the problem until a disk drive fails, and the data returned on reads and writes from the RAID device will be incorrect.
In order to keep parity and data synchronized at all times, all write operations can be first placed in a xe2x80x9cprewritexe2x80x9d area, having numerous prewrite slots, for temporary persistent storage, and then written to the actual logical blocks of the disk. This guarantees that if the host computer fails or crashes, or if the RAID device crashes, the data and parity can be kept in synchronization. The prewrite process uses the following steps:
1) write the data and parity to a prewrite area;
2) write the data and parity to actual logical blocks of the disks;
3) invalidate the data and parity in the prewrite area.
The xe2x80x9cinvalidationxe2x80x9d step three is required to prevent data in the prewrite area from being erroneously backed up and corrupted by a system crash. Invalidation is defined as marking the prewrite data/parity as invalid or non-usable, preventing the information from being replayed upon initialization of the RAID array. Invalidation is performed after the parity and data stored in the prewrite slots have been physically written to their proper physical location on the RAID disk array. For example, a tag can be placed over each prewrite slot after the data/parity has been written to the disk indicating that the data/parity in the prewrite area is no longer valid and should not be used.
This unconditional invalidation step is expensive in time and performance, as it requires a separate disk write operation. The performance cost can be up to approximately 10 milliseconds per logical write operation to the RAID disk array.
FIG. 1 shows the steps disclosed in the co-pending application, xe2x80x9cHOST-BASED RAID-5 AND NV-RAM INTEGRATIONxe2x80x9d, referenced above, for performing a single xe2x80x9clogicalxe2x80x9d write of new data in a RAID-5 device.
Operation 20 reads the old data from the disk, while operation 22 reads the old parity from the disk. Operations 20 and 22 are needed to calculate the new parity information. Operation 24 generates the new parity information by first removing the old data from the parity information, which can be achieved by an exclusive-OR operation. The new parity information is then generated by including the new data into the parity information, which can also be achieved using an exclusive-OR calculation.
Having calculated the new parity information corresponding to the new data, operation 26 records or xe2x80x9cprewritesxe2x80x9d the new data and the new parity to a prewrite region of the disk. In this manner, if the computing system is interrupted or if a single disk in the RAID array fails before the new data and new parity are both completed written to the disk, the new parity/new data information will always be synchronized. As previously explained, synchronization between data and parity is needed to correctly reconstruct data stored on a failed disk drive.
Having permanently recorded the new data and new parity in the prewrite area of the disk, this information can now be transferred to the respective storage locations on the disk drives. Operation 28 writes the new data to the disk, and operation 30 writes the new parity information to the disk. In this manner, both the new data and the new parity are now synchronously maintained on the disk drive.
Operation 32 marks the logical write operation to the RAID device as complete. This operation would include invalidating the data and parity information stored by operation 26 in the prewrite area of the disk. Upon a system failure, the data and parity information which are stored in the prewrite area can be used to restore data if that prewrite data/parity has not been marked invalid.
The invalidation step requires two write operationsxe2x80x94one write operation to mark the prewrite data as invalid, and one write operation to mark the prewrite parity as invalid. This is in addition to the six disk input/output operations previously described. Hence, one logical write of new data to the RAID device would require eight physical disk input/output operations to the RAID device, a costly process.
What is needed is a device and method which is capable of minimizing the number of invalidating write operations while simultaneously ensuring the synchronization between parity and data on the RAID device.
In accordance with this invention, the above problems have been solved by maintaining a scoreboard memory structure to monitor the state of the prewrite slots in the prewrite area of the storage devices, and detecting the conditions under which an invalidation of the prewrite slots should occur. In this manner, the present invention removes the need to unconditionally invalidate prewrite areas by detecting when invalidation is necessary.
New prewrite slots are allocated based on the contents of the scoreboard. The scoreboard also permits overlapping prewrites to be detected, and only the overlapping prewrite slots are invalidated, thereby reducing the number of invalidation operations performed by the RAID device.
Disclosed herein is a method for writing new data in a computing system having a system memory and at least two storage devices arranged in a RAID configuration. The first and second storage devices each have prewrite slots for pre-storage of data and parity information. A scoreboard structure in the system memory of the computing system is provided for tracking a state of said prewrite slots. One of the prewrite slots is allocated for recording the new data in the first storage device and for recording the new parity in the second storage device. The scoreboard memory structure is used to detect an overlapped prewrite slot, and for conditionally invalidating the overlapped prewrite slot. The new parity is computed from the new data, an old parity value, and an old data value stored in the computing system. The new data is stored in the prewrite slot allocated by the allocation step to the first storage device and the new parity to the second storage device. Upon completion of the storage of the data and parity to the prewrite slots, the new data is written to the first storage device, and the new parity is written to the second storage device.
In an embodiment of the invention, an identification variable is created for associating the data and parity in the prewrite slots across the first and second storage devices, and a block variable is assigned to each identification variable corresponding to a range of blocks occupied with the storage devices. The block variable of a prewrite slot are compared to a block variable of an allocated prewrite slot to detect if the block variable of the prewrite slot matches the block variable of the allocated prewrite slot. If so, the prewrite slot is marked as invalid and should not be used for data recovery.
The scoreboard memory structure can be formed to contain a set of state variables associated with each of the prewrite slots. The set of state variables can comprise an UNUSED state variable, an ACTIVE state variable, an AVAILABLE state variable, and an INVALIDATING state variable.
In a machine implementation of the invention, an apparatus for storing data in a computing system has a first and second storage device, a memory structure, a detection module, and allocation module, and an invalidation module. The first and second storage device each have prewrite slots for pre-storage of the data. The memory structure is coupled to the storage devices for tracking a state of the prewrite slots. The detection module is coupled to the memory structure for monitoring the memory structure to detect when any prewrite slots should be marked invalid. The allocation module is coupled to the memory structure and to the storage devices for allocating prewrite slots for pre-storing the data in a prewrite slot. The invalidation module is coupled to the detection module and to the storage devices for marking any prewrite slots invalid responsive to the detection module.
The apparatus of the present invention can be used where the storage devices are arranged in a RAID-1, RAID-4, or RAID-5 configuration.
The above computer implemented steps in another implementation of the invention are provided as an article of manufacture, i.e., a computer storage medium containing a computer program of instructions for performing the above-described steps.
The great utility of the present invention is an improvement in the performance of a RAID disk array achieved by reducing the number of invalidation operations required for each logical write operation to the disk array.
The foregoing and other features, utilities and advantages of the invention will be apparent from the following more particular description of a preferred embodiment of the invention as illustrated in the accompanying drawings.