1. Technical Field
This disclosure relates generally to data backup systems, and, more specifically, to a distributed system and method for generating a synthetic full backup.
2. Description of the Related Art
Computer systems, and their components, are subject to various types of failures which may result in the loss of data. For example, a storage device used in or by the computer system may experience a failure (e.g. mechanical, electrical, magnetic, etc.) which may make any data stored on that storage device unreadable. Erroneous software or hardware operation may corrupt the data stored on a storage device, destroying the data stored on an otherwise properly functioning storage device. Any component in the storage chain between (and including) the storage device and the computer system may experience failure (e.g. the storage device, connectors (e.g. cables) between the storage device and other circuitry, the network between the storage device and the accessing computer system (in some cases), etc.).
To mitigate the risk of losing data, computer systems typically replicate (or make backup copies of) data stored on various storage devices. A variety of techniques are available for backing up data. For example, for a given set of data (such as the data on a particular computer system, e.g., one or more file systems and/or volumes) a full backup backs up the entire data set. This leads to the backup data set typically being stored in a single location, which simplifies restore operations. However, in systems with relatively few changes compared to the overall number of files, performing full backups can be relatively resource (e.g., time and network bandwidth) inefficient. In addition, the storage cost of maintaining multiple full backups is significant.
In order to reduce overall storage requirements for data backup, incremental backups are sometimes employed. An incremental backup typically only backs up files that have changed since the last backup (e.g., full or incremental) was taken. In systems with relatively few changes compared to the overall number of files, the time needed to take an incremental backup is typically substantially less than that needed for a full backup. However, performing a series of incremental backups can lead to data being stored in disparate locations, which can cause management of this data to become more complicated over time. Restores also often take longer in such situations, as information must be collected from different locations and longer and longer series of incremental backups. Because of this, it is desirable to periodically have full backups. However, since full backups typically consume significant network bandwidth and cut into the time available to perform incremental backups, improvements in backup systems that address the existing problems with incremental and full backups are desirable.