1. Field of Invention
The present invention relates generally to rebuilds in the field of RAID (Redundant Array of Independent Disks).
2. Description of Related Art
The present invention relates generally to RAID. RAID is a storage technology that combines multiple disk drive components into a logical unit, a form of virtualization, primarily in order to reduce errors caused by disk failure, in particular in a network. Data is broken up into blocks stored on several disks in a sequential manner known as data striping. Often a parity block will form a means of checking for errors and reconstructing data in the event of a failure of one of the disk drives, forming parity redundancy.
Properly configured, RAID produces several benefits. RAID benefits include higher data security via redundancy (save RAID 0 configurations), fault tolerance, improved access to data (data availability), increased integrated capacity for creating large amounts of contiguous disk space, and improved performance. RAID costs include more expensive hardware and specialized maintenance requirements.
There are many types of RAID, especially if one includes hybrid (nested) RAID systems, however, RAID Levels 0-6 successfully define all typical data mapping and protection schemes for disk-based storage systems. Other classification systems for RAID include failure-resistant (systems that protect against data loss due to drive failure), failure-tolerant (systems that protect against loss of data access due to failure of any single or multiple component), and disaster-tolerant (systems that consist of two or more independent zones, either of which provides access to stored data).
Some popular RAID Levels include RAID 0, RAID 5 and RAID 6. RAID 0 is block-Level striping without parity or mirroring, with zero redundancy. It provides improved performance and additional storage but no fault tolerance. In RAID 0, blocks are written to their respective drives simultaneously on the same sector, which smaller sections of the data to be read off each drive in parallel, increasing bandwidth. RAID 0 does not implement error checking, so any error is uncorrectable. RAID 0 is often used in hybrid RAID systems to increase performance. RAID 5 is block-level striping with distributed parity, with the parity distributed along with the data and all but one drive are present for use, with the one drive being in reserve in the event of a single drive failure. In the event of a single drive failure, the array is not destroyed and any subsequent reads of data can be calculated from the distributed parity such that the drive failure is masked from the end user. A single drive failure however results in reduced performance of the entire array until the failed drive has been replaced and the associated data rebuilt. RAID 6 is block-level striping with double distributed parity and provides fault tolerance of two drive failures. The array continues to operate with even up to two failed drives. The advantage of RAID 6 is that it makes larger RAID groups more feasible, which is important as large-capacity drives lengthen the time needed to recover to rebuild and recover from the failure of a single drive.
Copyback is the replacement of a functioning array member disk by another member, by copying the disk contents to the replacement disk. Copyback is often used to replace a failing component, before it fails and degrades the array or restore a particular physical configuration for an array and is accomplished without reduction of the array.
Secondary memory stored in traditional spindle-based hard drives has a number of rotating disk platters storing data read by a magnetic head held by an armature. Modern drives typically have several heads and platters. For a single I/O operation to complete the armature must move the head to the sector in a platter track that holds data, a process called seeking, that has a seek time to complete, and wait for the desired sectors to rotate under the head, with the time it takes for this wait called rotational latency. These times and any delays caused by firmware, software or other hardware comprise the drive response.
IOPS (Input/Output Operations Per Second, pronounced eye-ops) is a common performance measurement used to benchmark computer storage devices like hard disk drives (HDD) and storage area networks (SAN). IOPS numbers published by storage device manufacturers do not guarantee real-world application performance IOPS can be measured with applications, such as Iometer (originally developed by Intel). The specific number of IOPS possible in any system configuration will vary greatly, depending upon various variables, including the balance of read and write operations, the mix of sequential and random access patterns, the number of worker threads and queue depth, as well as the data block sizes, as well as other factors the system configuration, storage drivers, OS background operations and the like.
When a RAID array has one or more hard drives fail, they have to be rebuilt. The IO operations that constitute rebuilding are rebuild IOs while the IO operations that are for ordinary non-rebuild operations, such as normal operation of the hard drives in the RAID group, are host IOs. In a RAID system, rebuild performance under host IO conditions is severely impacted. This occurs because rebuild operation requires a read of all remaining disks in the disk group, and each disk has to seek to service the rebuild operation. Add to this that each drive model has its own method to optimize its IO queue, often this method being proprietary, and reorders the IOs to minimize drive seeks. As a result, the rebuild IOs are severely impacted and suffer high latency, as they are usually the ones that get reordered most. This directly affects the rebuild performance, and the system can take 8-30 days to rebuild a mere 1 TB of data. Such Long rebuild times further exposes the RAID group to prolonged periods of degraded host IO performance, and opens the group to secondary or tertiary drive failures that can take the whole RAID group offline, with the potential loss of data.
All RAID systems typically have many IO queues managed and controlled by a controller, and each drive has its own IO queue as well. The present invention concerns itself with the latter, the queue within an individual drive, which may be 32 or 64 commands deep. Often, rebuild IOs suffer large latency or response times, because these IOs typically do not share the same locality or geographic presence with the rest host IO in the drive queue, with locality defined as a common region or cluster of sectors on the disk that are grouped so that the drive heads do not have to seek very far to get from one LBA (Logical Block Address) to the next, thus there can arise adverse effects during rebuild IOs. As rebuild operation results in the complete drive to be read to reconstruct data, for most of the rebuild operation the rebuild IOs will not share locality with the host IOs. ALL systems typically control the rate of rebuild IOs and host IOs to a drive, but once they are handed over to the drive, the drive takes over, and, as explained herein, the drive may skew the rates computed by the RAID controller, resulting in rebuild IO starvation.
Drives can be rebuilt in a series of one or more concurrent threads or processes, which are implementation dependent and decided by the firmware in a RAID system, based on available system resources and the granularity of the IO size of the disk group, as for example the stripe size of a virtual group being rebuilt.
The present invention addresses this concern of high latency of rebuild IOs resulting in prolonged rebuild times with a novel heuristic method of rebuilding while continuing to service host IO operations.