A storage server is a computer system and a form of storage controller that is used to store and retrieve data on behalf of one or more clients on a network. A storage server operates on behalf of one or more clients to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes. A storage server may be configured to service file-level requests from clients, as in the case of file servers used in a Network Attached Storage (NAS) environment. Alternatively, a storage server may be configured to service block-level requests from clients, as done by storage servers used in a Storage Area Network (SAN) environment. Further, some storage servers are capable of servicing both file-level and block-level requests, as done by certain storage servers made by NetApp®, Inc. of Sunnyvale, Calif.
In conventional network storage systems, the mass storage devices may be organized into one or more groups of drives. Redundant Array of Inexpensive/Independent Disks (RAID) is a technique using the one or more groups of disk drives in a way to achieve greater level of performance and reliability. In a RAID organization of drives, data can be divided and distributed to multiple physical disks. The distribution of data increases input/output throughputs since multiple disks simultaneously participate in the reading and writing of the data. Data can also be replicated in a RAID organization. Replication ensures that data remain available even if one of the disks fails. Such replication of data is often called data redundancy.
When a set of disk drives is configured under a RAID scheme, the set of disk drives is commonly referred to as a RAID group. There are multiple RAID schemes available, each of which has its own distinctive features. For example, RAID 0 (striped disks) increases input/output throughputs by distributing data across several disks. However, since there is no redundancy in level 0 RAID, data would be lost if any one of the disks fails. In a RAID 1 (mirrored disks) configuration, a piece of data can be duplicated to two or more disks. Thus, data would not be lost as long as there is one disk available. Still, RAID 1 scheme is less efficient in storage usage since only half of the available space can be used for data.
For a RAID scheme with data redundancy capability, data is not lost as long as there are enough disks available for failure recovery. When a disk failure is detected by a storage device, a RAID storage system can immediately switch to a degraded state. In the degraded state, data remain available and data services can still be maintained. But the performance of the RAID storage system is greatly reduced since constant calculation is required to derive data from the surviving disks. To restore the RAID storage system to a normal state, an operator could replace the failed disks either by hot-swapping (replacing the disks without powering down the system), or by cold-swapping (replacing the disks after the system is powered off). After the failed disks are replaced, a RAID system is capable of automatically rebuilding the data on the failed disk. Data redundancy can be reinstated when data originally stored in the failed disks are reconstructed or restored on the replacement disks.
When a RAID system operates in a degraded state, the speed of reconstruction becomes crucial, especially since any additional disk failure could cause permanent data loss. Often the RAID system must await the replacement of the failed disks before being able to reconstruct data to the replacement disks. Once reconstruction started, the RAID system allocates a significant amount of system resources to the reconstruction process. As a result, the reconstruction process further reduces the performance of the RAID system which is already operating in a degraded state. In addition, reconstruction often takes a long time to complete. Thus, even a hot spare disk, which is pre-configured as a replacement disk, would not help much in reducing the reconstruction time.
The reason for the long recovery process is due to the limited I/O bandwidth provided by the replacement disks and/or the surviving disks. For example, in a RAID 4 configuration, a dedicated disk maintains parity information for all other disks. If this dedicated disk fails, then data reconstruction, which is to rebuild and store parity onto the dedicated disk, is limited by the write bandwidth of the disk. Similarly, since all disks are required in providing redundant data, the data reconstruction is also limited by the collective read bandwidth of all the surviving disks.