1. Technical Field
The present invention relates in general to improved data storage systems and in particular to an improved method and system for transferring data from a first data storage configuration to a second data storage configuration. Still more particularly, the present invention relates to an improved method and system for checkpointing data being migrated during a logical drive migration, a rebuild, a copy or other background process.
2. Description of the Related Art
As the performance of microprocessor and memory technology improves, there is a need for better data storage systems with comparable performance enhancements. Additionally, in enhancing the performance of data storage systems, there is a need for improved reliability of data storage. In 1988 a paper was published by Patterson, Gibson and Katz titled, A Case for Redundant Arrays of Inexpensive Disks (RAID), International Conference on Management Data, pages 109-116, June 1988. This paper laid the foundation for the use of redundant arrays and inexpensive disks that would not only improve the data transfer rate and data I/O rate over a comparable single disk access, but would also provide error correction at a lower cost in data storage systems.
RAID technology utilizes the grouping of several physical drives in a computer into an array that can be defined as one or more logical drives. Each logical drive appears to the operating system as a single drive. This grouping technique enhances logical drive capacity and performance beyond the physical limitations of a single physical drive. When multiple physical drives are grouped into a logical drive, a RAID controller can transfer data in parallel from the multiple drives in the array. This paralleled transferring yields data transfer rates that are many times higher than with non-array drives. This increased speed makes the system better able to meet the throughput (the amount of data processed in a given amount of time) or productivity needs of the multiple user network environment, as well as decreasing response time. The combination of parallel transfers and simultaneous responses to multiple requests allows disk arrays to provide a high level of performance in network environments.
With RAID technology, data is striped across an array of physical drives. This data-distribution scheme complements the way the operating system requests data. The granularity, at which data is stored on one drive of the array before subsequent data is stored on the next drive of the array, is called the stripe-unit size. The stripe-unit size can be controlled to maximize the performance of the RAID controller by setting a stripe-unit size to a value that is close to the size of the system I/O request. Typically, this stripe-unit size is 8 KB, 16 KB, 32 KB or 64 KB. The collection of stripe-units, from the first physical drive of the array to the last physical drive of the array, is called a stripe.
RAID levels are essentially defined by if and how they accomplish redundancy of data storage. There are six standardized RAID levels, termed RAID 0, 1, 2, 3, 4 or 5. The most common RAID levels used are levels 0, 1 and 5.
RAID level-0 stripes the data across all the drives in the array with no data redundancy. RAID level-0 provides the largest storage capacity of the RAID levels that are offered, because no room is taken for the redundant data or data parity storage. A simple example of a RAID level-0 logical drive uses two physical drives combined to create an array. A logical drive is then created within that array and the data is striped across all the drives in the array creating blocks. The blocks, each of which correlates to a particular stripe unit, hold the data. Note that the terms xe2x80x9cblockxe2x80x9d and xe2x80x9cstripe unitxe2x80x9d may be used interchangeably. Typically, the first physical drive will contain every other block of data from the original source, with the second physical drive in the logical drive holding the remaining alternate data blocks.
RAID level-1 provides 100% data redundancy and requires a minimum of two physical drives. With RAID level-1, the first half of the stripe is the original data; the second half of the stripe is a mirror (exact copy) of the data, but written to the other drive in the RAID level-1 array. Because the data is mirrored, the capacity of the logical drive when assigned RAID level-1 is 50% of the array capacity. To establish a typical RAID level-1 logical drive, two physical drives create an array, which creates a single logical drive. The data is striped across the drives creating blocks that are mirror copies between the two physical drives. During normal operations, the RAID controller reads data from either physical drive within the array. If one of the physical drives fails, the RAID controller switches read and write requests to only the remaining functional drive in the RAID level-1 array. A back-up hot-spare drive can also be provided for immediate rebuild of the failed physical drive.
RAID level-5 requires a minimum of three physical drives. This RAID level stripes data and parity across all drives in the array. This parity/data storage reduces the capacity of the array by one drive. A typical RAID level-5 logical drive starts with four physical drives. An array is created using three of the physical drives, leaving the fourth as a hot-spare drive. A logical drive is created within the three drive array and data is striped across the drives creating blocks. Within each stripe is at least one parity block. Typically, the parity block stores the result of an Exclusive OR (XOR) logical function of the data in the other two physical drives in the stripe. If a drive holding a data stripe unit fails, that data stripe-unit (block) can be reconstructed by applying the XOR logical operation to the remaining data stripe. For example, if the data bit in the still functional physical drive is 1 and the XOR parity bit is 1, then the lost data bit in the failed drive would be 0. If the remaining data bit is 1 and the parity bit for XOR is 0, then the lost data bit must be 1. The parity bit check is more efficient than mirroring, since a single XOR or other parity bit can rebuild either drive that fails. If a physical drive fails in the RAID level-5 array, the hot-spare drive is rebuilt to contain data that was lost in the failed drive, and the RAID controller switches read and write functions to that hot-spare drive for data that was in the failed drive.
A logical drive migration moves data from a first configuration of a logical drive to a second configuration of the same logical drive. This migration may be from one RAID level to another RAID level, or from a first configuration of a logical drive to a second configuration of the logical drive, typically with the second configuration having a higher number of physical drives than the first configuration. During the logical drive migration, it is possible for the system to shut down. If this occurs without checkpointing the progress of the migration, the system will not know which data has been overwritten during the migration or where to find the stripe unit for the updated data.
As each stripe is migrated, a checkpoint can be recorded for that stripe to keep track of the migration process. If the system should fail during the migration, then the migration can simply pick back up at the last checkpoint. However, this high frequency of checkpointing is very expensive, since every checkpoint obviously must be stored for future retrieval. If this checkpoint is stored on a disk drive, the checkpoint saving function is very slow. If the checkpoint is saved in a control area at the beginning or end of the drive space from the logical drive being migrated, then the disk head must move back and forth from the data being migrated to the checkpointing area, which is mechanically and temporally prohibitive. Even if the system has NonVolatile Random Access Memory (NVRAM) the checkpointing is still not trivial as the NVRAM access is typically much slower than regular memory access.
It should therefore be apparent that there exists a need for a method that will improve the performance of a logical drive migration by the use of intelligent checkpointing. It would further be desirable to devise a computer program product wherein such a method may be performed on a computer system. In addition, it would be desirable to devise a multiple drive system having improved logical drive capability.
The present invention incorporates a method for determining if a destination stripe in a second configuration of a logical drive contains data that, when migrated, will overlay data in a first configuration of the logical drive before the destination stripe has been checkpointed. Prior to the migration of the data to the destination stripe, the overlaid data must be checkpointed to let the system know where it has migrated. A device implementing the invention may be embodied in a RAID controller at any RAID level. In addition, the above described method may be used to checkpoint data migration in other data storage background processes, such as rebuilds and copies. The present invention may also be embodied in a computer program product having machine-readable instructions for carrying out the above described method.
The above, as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.