The present invention relates to data storage systems. More specifically, the present invention relates to redundant arrays of independent disks (RAID) and mapping client data on storage devices.
Many businesses and individuals depend on information stored in their computer systems. Even though modern disk drives have mean-time-to-failure (MTTF) values measured in hundreds of years, a sufficiently large collection of disk drives can experience frequent failure.
RAID is commonly used to provide protection against failure. For small numbers of disks, the preferred method of fault protection is duplicating (mirroring) data on two disks with independent failure modes. Using RAID 1 (mirroring) or RAID 1/0 (striped mirroring), two copies of data are stored on different disks. If one disk fails and the copy thereon becomes inaccessible, the copy on the other disk can be accessed.
For data storage devices having large numbers of disks, a more cost-effective method of fault protection is using partial redundancy (such as parity). Using RAID 5 (striping with rotated parity), host data blocks are block-interleaved across the disks, and the disk on which the parity block is stored rotates in round-robin fashion for different stripes. A RAID group having N disks will use 1/N of the storage capacity for storing the redundancy (parity) data. If one of the disks is damaged, the parity data is used to reconstruct the data.
Consider an example in which data is striped over four disks. Data blocks C0, C1 and C2 are stored on the first, second and third disks and parity data P0 is stored on the fourth disk. If the second disk fails, the parity data P0 and the first and third blocks C0 and C2 may be used to reconstruct the second block C1.
Recovering lost data via RAID is much faster than reloading the lost data from backup tapes. In large data storage systems, reloading the lost data from backup tapes can take hours or even days, resulting in very costly downtime.
However, different RAID levels have different performance characteristics and costs. With RAID 1/0 storage, disk space is doubled to store the redundant information. For example, two megabytes of disk space are used to store one megabyte of data. Doubling the disk space doubles the cost of storage.
RAID 5 has a lower storage cost because a smaller fraction of the disk space is used for storing redundant information. However, RAID 5 suffers reduced performance in xe2x80x9cdegradedxe2x80x9d modexe2x80x94when one of the drives has failed and/or data needs to be repaired. Because data is reconstructed from redundant information, additional I/O operations are performed.
Moreover, RAID 5 can have a higher overhead when writing to disks. For each write operation, parity data is re-calculated. Thus, a small write penalty is incurred because disk reads are performed on the data that does not change in order to calculate the new parity data. In contrast, RAID 1/0 does not incur this write penalty. For large writes however, RAID 5 can provide better performance, as extra writes have to be made for the parity data only, as opposed to every block, which must be replicated in RAID 1/0.
RAID 1/0 offers potentially higher reliability than RAID 5. In RAID 5, the loss of any two disks will result in the loss of data. In RAID 1/0, higher reliability results from the data being mirrored: even if two disks fail, the chance of data being lost is substantially lower, as the two disks may be in different mirrored pairs.
When initially mapping data to a disk array, it is very desirable to choose the best RAID level. A wrong choice can be costly because poor performance and resource utilization, or decreased availability could result. Choosing the wrong RAID level could also result in increased storage costs, due to the relative amount of redundant data kept in different RAID schemes.
Correcting a wrong choice can also be costly: it can involve bringing a system off-line (since most RAID controllers do not allow RAID levels to be changed on the fly), copying data from the array to another storage device, reformatting the array and then reloading the data onto the reformatted array. This process can take hours. In addition, loss of data can potentially occur due to mistakes at any of these stages.
Wrong choices can add up for large enterprise systems, where tens to hundreds of host computers are connected by a storage area network to tens to hundreds of storage devices having tens of thousands of disks. Thus, wrong choices for large enterprise systems can be very costly.
According to one aspect of the present invention, RAID levels are assigned to data prior to loading the data in a data storage device. The RAID levels are determined by applying an algorithm to at least one of a set of device specifications and data workload specifications.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the present invention.