The present disclosure relates to data storage systems. In a more particular example, the present disclosure relates to methods and systems for rebuilding a failed storage device in a data storage system.
Data storage systems such as redundant array of independent disks (RAID) or newer erasure coding architectures typically have a storage device rebuild mechanism for rebuilding failed storage device(s) within the storage system. For instance, in a conventional RAID 5 or RAID 6 storage system where data is striped (i.e., the data is divided into segments and stored consecutively across multiple storage devices within the storage array), rebuilding a failed storage device in the array involves reading the data segments from all of the storage devices in the array that have not failed, reconstructing the lost data from the failed storage device, and then writing the reconstructed data to the replacement storage device. This process is often time consuming and computationally intensive since entire datasets, which could reach hundreds of terabytes (TB), need to be retrieved from the remaining storage devices in the array in order to rebuild the failed storage device. The storage device rebuild process negatively affects normal host traffic, leading up to a 2× performance loss and significant increases in host read/write latency.