In recent years, the amount of data stored digitally on computer storage devices has increased dramatically. To accommodate increasing data storage needs, larger capacity storage devices have been developed. Typically, these storage devices are a single magnetic storage disk. Unfortunately, multiple concurrent access requests to a single storage drive can slow data reads and writes to a single drive system. One response to this problem has been to connect a plurality of storage devices to form a storage node. On storage nodes, data may be distributed over several storage disks. For example, a read operation for a file distributed over several storage drives may be faster than for a file located on a single drive because a distributed system permits parallel read requests for smaller portions of the file. Another response has been to connect a plurality of storage nodes to form a storage system of even larger capacity, referred to as a “cluster.”
One problem associated with distributed systems is drive failure and data loss. Though read and write access times tend to decrease as the number of storage devices in a system increase, the chances of storage device failures also increase as the number of storage devices increases. Thus, a distributed system is vulnerable to both temporary and permanent unavailability of storage devices.
When a storage device, for example, either a storage drive or a storage node, becomes unavailable, storage systems have to remove the storage device from the system and fully reconstruct the devices. As storage devices become increasingly larger, the amount of time required to fully reconstruct an unavailable storage device increases correspondingly, which affects response time and further exacerbates the risk of permanent data loss due to multiple device failures.