A distributed storage service may include multiple concurrent processes executing across a distributed hardware infrastructure, such as one or more clusters of computers. Various ones of these processes may be executing on different physical and/or logical (e.g., virtual) machines in the cluster(s). In a storage service, for example, processes (e.g., software servers) on different machines may each expose a programmatic interface to clients, which the clients may use to access a storage system that may be implemented across multiple storage resources. The storage service may store multiple replicas of each data item in the system, such that any change to a data item on one server must be propagated to one or more other servers.
Upon the failure of a node or disk drive, the data on the failed device must be restored. In many current storage systems that provide database services, the entire data set must be restored (e.g., from a backup or archive) before the system can resume accepting and processing queries. In some systems that perform incremental backups, restoring the system after a device failure involves performing multiple incremental restore operations (corresponding to multiple incremental backup operations). In other storage systems, restoring the system after a device failure involves tracing through transaction logs to reconstruct the state of the system. For data warehouse systems that include a large number of storage devices, the amount of time that the system must be taken out of service to perform restore operations on one or a small number of devices may represent a significant cost in the system.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.