1. Technical Field
This application relates to managing degraded storage elements in data storage systems.
2. Description of Related Art
A traditional storage array (herein also referred to as a “data storage system”, “disk storage array”, “storage array”, “disk array”, or simply “array”) is a collection of hard disk drives operating together logically as a unified storage device. Storage arrays are designed to store large quantities of data. Storage arrays typically include one or more storage array processors (SPs), for handling requests for allocation and input/output (I/O) requests. An SP is the controller for and primary interface to the storage array.
Storage arrays are typically used to provide storage space for one or more computer file systems, databases, applications, and the like. For this and other reasons, it is common for storage arrays to logically partition a set of disk drives into chunks of storage space, called logical units, or LUs. This enables a unified storage array to provide the storage space as a collection of separate file systems, network drives, and/or Logical Units.
Performance of a storage array may be characterized by the array's total capacity, response time, and throughput. The capacity of a storage array is the maximum total amount of data that can be stored on the array. The response time of an array is the amount of time that it takes to read data from or write data to the array. The throughput of an array is a measure of the amount of data that can be transferred into or out of (i.e., written to or read from) the array over a given period of time.
The administrator of a storage array may desire to operate the array in a manner that maximizes throughput and minimizes response time. In general, performance of a storage array may be constrained by both physical and temporal constraints. Examples of physical constraints include bus occupancy and availability, excessive disk arm movement, and uneven distribution of load across disks. Examples of temporal constraints include bus bandwidth, bus speed, spindle rotational speed, serial versus parallel access to multiple read/write heads, and the size of data transfer buffers.
One factor that may limit the performance of a storage array is the performance of each individual storage component. For example, the read access time of a disk storage array is constrained by the access time of the disk drive from which the data is being read. Read access time may be affected by physical characteristics of the disk drive, such as the number of revolutions per minute of the spindle: the faster the spin, the less time it takes for the sector being read to come around to the read/write head. The placement of the data on the platter also affects access time, because it takes time for the arm to move to, detect, and properly orient itself over the proper track (or cylinder, for multihead/multiplatter drives). Reducing the read/write arm swing reduces the access time. Finally, the type of drive interface may have a significant impact on overall disk array storage. For example, a multihead drive that supports reads or writes on all heads in parallel will have a much greater throughput than a multihead drive that allows only one head at a time to read or write data.
Furthermore, even if a disk storage array uses the fastest disks available, the performance of the array may be unnecessarily limited if only one of those disks may be accessed at a time. In other words, performance of a storage array, whether it is an array of disks, tapes, flash drives, or other storage entities, may also be limited by system constraints, such the number of data transfer buses available in the system and the density of traffic on each bus.
Large storage arrays today manage many disks that are not identical. Storage arrays use different types of disks and group the like kinds of disks into tiers based on the performance characteristics of the disks. A group of fast but small disks may be a fast tier (also referred to as “higher tier” or “high tier”). A group of slow but large disks may be a slow tier (also referred to as “lower tier” or “low tier”). It may be possible to have different tiers with different properties or constructed from a mix of different types of physical disks to achieve a performance or price goal. Storing often referenced, or hot, data on the fast tier and less often referenced, or cold, data on the slow tier may create a more favorable customer cost/performance profile than storing all data on a single kind of disk.
A storage tier may be made up of different types of disks, i.e., disks with different RAID (Redundant Array of Independent or Inexpensive Disks) levels, performance and cost characteristics. Several levels of RAID systems have been defined in the industry. RAID parity schemes may be utilized to provide error detection during the transfer and retrieval of data across a storage system. The first level, RAID-0, combines two or more drives to create a larger virtual disk. In a dual drive RAID-0 system one disk contains the low numbered sectors or blocks and the other disk contains the high numbered sectors or blocks, forming one complete storage space. RAID-0 systems generally interleave the sectors of the virtual disk across the component drives, thereby improving the bandwidth of the combined virtual disk. Interleaving the data in that fashion is referred to as striping. RAID-0 systems provide no redundancy of data, so if a drive fails or data becomes corrupted, no recovery is possible short of backups made prior to the failure.
RAID-1 systems include one or more disks that provide redundancy of the virtual disk. One disk is required to contain the data of the virtual disk, as if it were the only disk of the array. One or more additional disks contain the same data as the first disk, providing a “mirror” of the data of the virtual disk. A RAID-1 system will contain at least two disks, the virtual disk being the size of the smallest of the component disks. A disadvantage of RAID-1 systems is that a write operation must be performed for each mirror disk, reducing the bandwidth of the overall array. In a dual drive RAID-1 system, the first disk and the second disk contain the same sectors or blocks, each disk holding exactly the same data.
RAID-2 systems provide for error correction through hamming codes. The component drives each contain a particular bit of a word, or an error correction bit of that word. RAID-2 systems automatically and transparently detect and correct single-bit defects, or single drive failures, while the array is running. Although RAID-2 systems improve the reliability of the array over other RAID types, they are less popular than some other systems due to the expense of the additional drives, and redundant onboard hardware error correction.
RAID-4 systems are similar to RAID-0 systems, in that data is striped over multiple drives. For example, the storage spaces of two disks are added together in interleaved fashion, while a third disk contains the parity of the first two disks. RAID-4 systems are unique in that they include an additional disk containing parity. For each byte of data at the same position on the striped drives, parity is computed over the bytes of all the drives and stored to the parity disk. The XOR operation is used to compute parity, providing a fast and symmetric operation that can regenerate the data of a single drive, given that the data of the remaining drives remains intact.
RAID-3 systems are essentially RAID-4 systems with the data striped at byte boundaries, and for that reason RAID-3 systems are generally slower than RAID-4 systems in most applications. RAID-4 and RAID-3 systems therefore are useful to provide virtual disks with redundancy, and additionally to provide large virtual drives, both with only one additional disk drive for the parity information. They have the disadvantage that the data throughput is limited by the throughput of the drive containing the parity information, which must be accessed for every read and write operation to the array.
RAID-5 systems are similar to RAID-4 systems, with the difference that the parity information is striped over all the disks with the data. For example, first, second, and third disks may each contain data and parity in interleaved fashion. Distributing the parity data generally increases the throughput of the array as compared to a RAID-4 system. RAID-5 systems may continue to operate though one of the disks has failed. RAID-6 systems are like RAID-5 systems, except that dual parity is kept to provide for normal operation if up to two drives fail.
Combinations of RAID systems are also possible. For example, a four disk RAID 1+0 system provides a concatenated file system that is also redundant. The first and second disks are mirrored, as are the third and fourth disks. The combination of the mirrored sets forms a storage space that is twice the size of one individual drive, assuming that all four are of equal size. Many other combinations of RAID systems are possible.
A storage array may be thought of as a system for managing a large amount of a resource, i.e., a large number of disk drives. Management of the resource may include allocation of a portion of the resource in response to allocation requests. In the storage array example, portions of the storage array may be allocated to, i.e., exclusively used by, entities that request such allocation. One issue that may be considered during allocation of a resource is the selection process—namely, how to determine which unallocated portion of the collection of resources is to be allocated to the requesting entity.
Conventionally, all resources of the same type are treated the same because it was assumed that the performance of components within the data storage array performed similarly and data would be stored and accessed evenly across the array. Initially, this assumption may be valid because any performance differences between resources of the same type and any asymmetries in data usage are unknown. However, as the data storage array fills up and the stored data is accessed, some resources may be more heavily utilized than other resources of the same type and/or resources of the same type may begin to perform differently. For example, two identical 7,200 rpm disks may initially be assumed to have identical performance and share data storage and processing loads equally because the client initially stores 10 gigabytes (GB) on each disk. However, at some later point in time, the client may either delete or rarely access the data stored on the second disk while constantly updating the files stored on the first disk. As a result, the first disk may operate with slower performance. While the client may have previously been able to observe this inefficiency, the client was unable to correct it because the client had no input or control regarding how slices were allocated or re-allocated in a logical volume created on a disk. For example, no mechanism currently exists for allocating slices for a logical volume from different performance tiers or other resource constraints specified by the client in a slice allocation request for the logical volume.
Conventional data storage systems may be configured in one or more various types of RAID configuration as described above. Depending on the particular RAID configuration, data on one storage device is typically backed-up on one or more other storage devices using conventional schemes such as parity information, data mirroring, and the like.
Storage elements, such as hard disk drives, have a typical lifetime rating after which time they will invariably fail. When a storage device in the data storage system fails, a conventional restore and repair process, such as that shown in FIG. 1, may be employed. For example, when one of a RAID's hard disk drive fails, data stored on the failed drive is reconstructed using parity information stored on the RAID's other remaining hard disk drives. In this case, data is read, in a physically linear fashion, starting at logical block address 0 and continuing to the last logical block on the drive. Data is reconstructed, and then written to a “hot spare” so configured for that particular RAID group. The hot spare is the same type of hard disk drive and has a capacity equal to that of the largest drive in the RAID group. Thus, drives are mapped on a one-to-one basis and cannot be different types (e.g., cannot migrate from SAS RAID-5 to EFD RAID-6). The failed hard disk drive is replaced, and the restored data is copied from the hot spare back onto the newly replaced, equivalent type drive. Then, the hot spare is once again made available as a hot spare for its associated RAID group.
During the rebuild time, the data storage system's performance is degraded. For example, the amount of time it takes for an application to retrieve data during the rebuild process can increase significantly, in part because they data storage system is trying to satisfy at least two tasks: rebuilding the failed drive's data and servicing application data requests (data which may also need to be reconstructed). Furthermore, some system architectures may cause data to become unavailable the moment the storage system becomes degraded (e.g., as required under certain data service level agreements).
The likelihood of irretrievable data loss is also increased during the rebuild time. For example, depending in the RAID type, should a second drive fail, parity data necessary for reconstructing data stored on the first failed drive will be lost, thereby preventing total data recovery resulting in permanent data loss.
Storage capacity of various storage elements continues to grow at an ever increasing rate. For example, 3 Terabyte hard disk drives are common place today, and larger drives have been announced or are under development. The time it takes the restoration process to complete is directly proportional to the size of the storage element, and, thus, rebuild time increased as well. Consequently, as storage capacity increases so too does the time a RAID group remains in a degraded mode as well as the amount of time data remains as risk.