The technology of Redundant Array of Independent Disks (RAID) has been widely used in storage systems to achieve high data performance and reliability. By maintaining redundant information within an array of disks, RAID can recover the data in case one or more disk failures occur in the array. The process of recovering data from disk failures in a RAID system is called data reconstruction. The data reconstruction process is very critical to both the performance and reliability of the RAID systems.
As an example, when a disk fails in the array, the array enters a degraded mode, and user I/O requests fails on the failed disk have to reconstruct data on the fly, which is quite expensive and causes great performance overhead. Moreover, the user I/O processes and reconstruction process run concurrently and compete for the disk bandwidth with each other, which further degrades the system performance. On the other hand, when the RAID system is recovering from one disk failure, a second disk failure may occur, which will exceed the system's failure tolerance ability, and cause permanent data loss. Thus, a prolonged data reconstruction process will introduce a long period of system vulnerability, and severely degrade system reliability.
FIG. 1 shows how a typical RAID system 10 performs an online reconstruction when a disk fails. The reconstruction process can reconstruct the RAID stripes of the RAID system 10 sequentially from the first to the last RAID stripe. To construct each RAID stripe, the reconstruction process can read out the corresponding data and parity blocks from the surviving disks (5, 15, 20, 25), regenerate the data block on a failed disk 10 through parity computation, and write the data block back to a replacing disk 30. During the online reconstruction, user I/O requests (40, 45) which fall onto the failed disk have to reconstruct the data on the fly. For a read request 40, all the other data and parity blocks in the parity group will be read out and the requested data will be reconstructed through parity computation. For a write request 45, all the other data blocks except the parity block will be read out, then the new parity block will be reconstructed and written back to the parity disk. Therefore, the user I/O processing in the reconstruction mode is more complicated and has lower performance than in the normal mode. Furthermore, the reconstruction process and the user I/O processes are running separately from each other, and the user I/O processing will not return to normal mode until the entire failed disk is reconstructed.
For data reconstruction, an ideal scenario is offline reconstruction, in which the array stops serving the user I/O requests, and lets the data reconstruction process run at its full speed. However, this scenario is not practical in most production environments, where the RAID systems are required to provide uninterrupted data services even when they are recovering from disk failures. In other words, RAID systems in production environments are undergoing online reconstruction, in which the reconstruction process and user I/O processes are running concurrently.
In previous work, several methods have been proposed to optimize the reconstruction process of RAID systems. The Workout method aims to redirect the user write data cache and popular read data to a surrogate RAID, and reclaim the write data to the original RAID when the reconstruction of original RAID completes. By doing so, Workout tries to separate the reconstruction process from the user I/O processes and leave the reconstruction process undisturbed. Another previous method is called Victim Disk First (VDF). VDF defines the system DRAM cache policy that caches the data in the failed disk in higher priority, so that the performance overhead of reconstructing the failed data on the fly can be minimized. A third previous work is called live block recovery. The method of live block recovery aims to recover only live file system data during reconstruction, skipping the unused data blocks. However, this method relies on the passing of file system information to the RAID block level, and thus requires significant changes of existing file systems.
Based on the above concerns, the data reconstruction process should be shortened as much as possible.
The present disclosure is susceptible to various modifications and alternative forms, and some representative embodiments have been shown by way of example in the drawings and will be described in detail herein. It should be understood, however, that the inventive aspects are not limited to the particular forms illustrated in the drawings. Rather, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the appended claims.