Networks and distributed storage allow data and storage space to be shared between devices located anywhere a connection is available. These implementations may range from a single machine offering a shared drive over a home network to an enterprise-class cloud storage array with multiple copies of data distributed throughout the world. Larger implementations may incorporate Network Attached Storage (NAS) devices, Storage Area Network (SAN) devices, and other configurations of storage elements and controllers in order to provide data and manage its flow. Improvements in distributed storage have given rise to a cycle where applications demand increasing amounts of data delivered with reduced latency, greater reliability, and greater throughput. Hand-in-hand with this trend, system administrators have taken advantage of falling storage prices to add capacity wherever possible.
However, one drawback to this abundance of cheap storage is the need to maintain regular backup copies of increasing amounts of data. Even though storage devices have become more reliable, they are not infallible. When multiple storage devices are grouped in a RAID array or other grouping, the probability of failure increases with each storage device added. While many RAID configurations offer redundancy such as parity or mirroring, it is still possible for a catastrophic failure to exceed the ability of the array to recover. Furthermore, RAID and other hardware redundancy safeguards offer no protection from user errors and accidentally deleted files.
While a number of backup solutions exist, backup and restore remain extremely time-consuming processes due, in part, to ever-increasing volume sizes. In typical examples, it takes hours or even days to restore data from a backup. While the simplest solution is to take the volume out of service during data restoration, in many applications, it is unacceptable for a volume to be inaccessible for this long. While further examples allow reads of the data that has already been restored, writes may still be prohibited or at least not guaranteed while data restoration is still in progress. Thus, while existing techniques for data protection have been generally adequate, the techniques described herein provide more efficient data backup and restoration, and in many examples, allow a system to continue to perform new data transactions while data is being restored.