A storage array or disk array is a data storage device that includes multiple disk drives or similar persistent storage units. A storage array can allow large amounts of data to be stored in an efficient manner. A storage array also can provide redundancy to promote reliability, as in the case of a Redundant Array of Inexpensive Disks (RAID) system. In general, RAID systems simultaneously use two or more hard disk drives, referred to herein as physical disk drives (PDs), to achieve greater levels of performance, reliability and/or larger data volume sizes. The phrase “RAID” is generally used to describe computer data storage schemes that divide and replicate data among multiple PDs. In RAID systems, one or more PDs are setup as a RAID virtual disk drive (VD). In a RAID VD, data might be distributed across multiple PDs, but the VD is seen by the user and by the operating system of the computer as a single disk. The VD is “virtual” in that storage space in the VD maps to the physical storage space in the PDs that make up the VD, but the VD usually does not itself represent a single physical storage device. Typically, a meta-data mapping table is used to translate an incoming VD identifier and address location into a PD identifier and address location.
Although a variety of different RAID system designs exist, all have two key design goals, namely: (1) to increase data reliability and (2) to increase input/output (I/O) performance. RAID has seven basic levels corresponding to different system designs. The seven basic RAID levels, typically referred to as RAID levels 0-6, are as follows. RAID level 0 uses striping to achieve improved data reliability and increased I/O performance. The term “striped” means that logically sequential data, such as a single data file, is fragmented and assigned to multiple PDs in a round-robbin fashion. Thus, the data is said to be “striped” over multiple PDs when the data is written. Striping improves performance and provides additional storage capacity. The fragments are written to their respective PDs simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off the drive in parallel, providing improved I/O bandwidth. The larger the number of PDs in the RAID system, the higher the bandwidth of the system, but also the greater the risk of data loss. Parity is not used in RAID level 0 systems, which means that RAID level 0 systems do not have any fault tolerance. Consequently, when any PD fails, the entire system fails.
In RAID level 1 systems, mirroring without parity is used. Mirroring corresponds to the replication of stored data onto separate PDs in real time to ensure that the data is continuously available. RAID level 1 systems provide fault tolerance from disk errors because all but one of the PDs can fail without causing the system to fail. RAID level 1 systems have increased read performance when used with multi-threaded operating systems, but also have a small reduction in write performance.
In RAID level 2 systems, redundancy is used and PDs are synchronized and striped in very small stripes, often in single bytes/words. Redundancy is achieved through the use of Hamming codes, which are calculated across bits on PDs and stored on multiple parity disks. If a PD fails, the parity bits can be used to reconstruct the data. Therefore, RAID level 2 systems provide fault tolerance. In essence, failure of a single PD does not result in failure of the system.
RAID level 3 systems use byte-level striping in combination with interleaved parity bits and a dedicated parity disk. RAID level 3 systems require the use of at least three PDs. The use of byte-level striping and redundancy results in improved performance and provides the system with fault tolerance. However, use of the dedicated parity disk creates a bottleneck for writing data due to the fact that every write requires updating of the parity data. A RAID level 3 system can continue to operate without parity and no performance penalty is suffered in the event that the parity disk fails.
RAID level 4 is essentially identical to RAID level 3 except that RAID level 4 systems employ block-level striping instead of byte-level or word-level striping. Because each stripe is relatively large, a single file can be stored in a block. Each PD operates independently and many different I/O requests can be handled in parallel. Error detection is achieved by using block-level parity bit interleaving. The interleaved parity bits are stored in a separate single parity disk.
RAID level 5 uses striping in combination with distributed parity. In order to implement distributed parity, all but one of the PDs must be present for the system to operate. Failure of any one of the PDs necessitates replacement of the PD. However, failure of a single one of the PDs does not cause the system to fail. Upon failure of one of the PDs, any subsequent reads can be calculated from the distributed parity such that the PD failure is masked from the end user. If a second one of the PDs fails, the system will suffer a loss of data, and the system is vulnerable until the data that was on the failed PD is reconstructed on a replacement PD.
RAID level 6 uses striping in combination with dual distributed parity. RAID level 6 systems require the use of at least four PDs, with two of the PDs being used for storing the distributed parity bits. The system can continue to operate even if two PDs fail. Dual parity becomes increasingly important in systems in which each VD is made up of a large number of PDs. RAID level systems that use single parity are vulnerable to data loss until the failed drive is rebuilt. In RAID level 6 systems, the use of dual parity allows a VD having a failed PD to be rebuilt without risking loss of data in the event that a PD of one of the other VDs fails before completion of the rebuild of the first failed PD.
Many variations on the seven basic RAID levels described above exist. For example, the attributes of RAID levels 0 and 1 may be combined to obtain a RAID level known as RAID level 0+1. When designing a RAID system, the RAID level that the system will have is selected at the time the design is created based on the needs of the user (i.e., cost, capacity, performance, and safety against loss of data). Over time, however, it is possible that the RAID system will cease to meet the user's needs. Often times, the user will replace the RAID system having the current RAID level with a new RAID system having a different RAID level. In order to replace the current RAID system, the data stored in the current RAID system is backed up to a temporary backup storage system. The VD parameters are also stored in a backup storage system. Once the data and VD parameters have been backed up, the new RAID system is put in place and made operational. The backed up data is then moved from the backup storage system to the new RAID system. The stored VD parameters are used to create a mapping between the VDs of the new RAID system and the PDs of the new RAID level system.
Recently, a technique known as RAID level migration has been used to migrate a RAID system from one RAID level to another RAID level. Using RAID level migration eliminates the need to replace the current RAID level system with a new RAID level system. With RAID level migration, it is not necessary to move the data to a backup storage system. Rather, during the migration process, the data is read from the PDs comprising the current VDs and written to the PDs comprising the new VDs. Migration is generally superior to replacement in terms of costs and time.
FIG. 1 illustrates a block diagram of a typical RAID system 2 having the capability of performing RAID level migration. The system 2 includes a hardware controller 3 for performing the RAID level migration. The hardware controller 3 includes a central processing unit (CPU) 4, a memory device 5, a nonvolatile random access memory device (NVRAM) 6, and an I/O interface device 7. The I/O interface device 7 is configured to perform data transfer in compliance with known data transfer protocol standards, such as the Serial Attached SCSI (SAS) and/or the Serial Advanced Technology Attachment (SATA) standards. The I/O interface device 7 controls the transfer of data to and from multiple PDs 8.
The controller 3 communicates via a peripheral interconnect (PCI) bus 9 with a server CPU 11 and a memory device 12. The memory device 12 stores software programs for execution by the server CPU 11 and data. During a typical write action, the server CPU 11 sends instructions for a write request via the PCI bus 9 to the hardware controller 3. The CPU 4 of the hardware controller 3 causes the data to be temporarily stored in a memory device 5 of the hardware controller 3. The data is subsequently transferred from the memory device 5 via the I/O interface device 7 to one or more of the PDs 8. The memory device 5 contains the core logic for performing the mapping between virtual addresses of the VD and physical addresses of the PDs 8. The CPU 4 performs calculations in accordance with the RAID level of the system 2, such as parity calculations. In the event that the current RAID level of the system 2 uses parity, the I/O interface device 7 causes the parity bits to be stored in one or more of the PDs 8.
During a typical read operation, the server CPU 11 sends a corresponding request to the hardware controller 3 via the PCI bus 9. The CPU 4, with use of the logic held in memory device 5, processes the request and causes the requested data to be retrieved from the PDs 8. The retrieved data is temporarily stored in the memory device 5. Subsequently, the data is read out of the memory device 5 and transferred over the PCI bus 9 to the server CPU 11 to satisfy the read request.
In order to perform the migration process, the CPU 4 of the hardware controller 3 reconfigures the core logic of the VDs in memory device 5 to cause it to operate in accordance with the new RAID level and to perform the new VD to PD mapping. During the migration process, the migration parameters are saved in the NVRAM 6. The migration parameters typically include:    (1) Migration Type: information that describes the type of migration that is being performed (e.g., R0 to R1, R1 to R5, etc.) and which PDs are involved;    (2) Migration Progress: information that describes which block of data is currently being migrated and the corresponding read and write pointers;    (3) Migration Status: information that describes which stage of operations has recently been completed; and    4) Media errors: information that describes media errors that have been temporarily recorded during the migration of the current block of data.
The purpose of storing the migration parameters in NVRAM 6 during the migration process is to eliminate the risk of data being lost during the migration process in the event of a loss of power or other unexpected errors. However, the need for a hardware controller having NVRAM increases the overall costs associated with the system 2 and increases the costs associated with performing RAID level migration.