Enterprises commonly maintain multiple copies of important data and expend large amounts of time and money to protect this data against losses due to disasters or catastrophes. In some storage systems, data is stored across numerous disks that are grouped together. These groups can be linked with arrays to form clusters having a large number of individual disks.
When an individual disk fails in a storage system, redundancy is lost with respect to the data stored on the failed disk. Data is vulnerable since only a single copy of the data exists. If the data from the failed disk is not rebuilt or copied, then failure of another disk could result in permanent loss of data.
In order to prevent permanent loss of data after a disk failure, data stored on the failed disk is rebuilt or copied to restore data redundancy. The process of reconstructing data to have data redundancy is known as rebuilding or sparing.
Problems can occur during the process rebuilding or sparing of data. If the process favors user input/output (I/O) requests so such requests preempt internal data recovery, then data sparing can stall in the presences of a sustained workload. During this time, data is vulnerable to being permanently lost if another disk failure occurs. On the contrary, if the process favors data recovery over user requests, then latency on the user requests can be high and cause unacceptable delays in processing requests.