Many database systems allow administrators or other authorized users to restore a database in the event of a database crash or other error. For example, a database system may employ a “shadow paging” system, in which a “last known good” version of a database is maintained within the database despite subsequent changes to the database. In the event of a crash, the last known good version is retrieved from the database and brought up to date (i.e., to the time of the crash) using data from a transaction log which is also stored in the database. The foregoing process limits the downtime needed for generating backups and for restoring the database from a stored previous version. However, the process requires that the last known good version and transaction log can be retrieved from the media in which the database is stored.
In order to provide recovery from media failure or other catastrophic failure, a database system may back up its data to a backup medium which is physically separate from the database system's storage media (e.g., one or more hard disks and/or Random Access Memory). In the event of a hardware failure, and if the database is backed up daily to a separate backup medium, an administrator may restore the database to a previous day's state by retrieving the previous day's data from the backup medium.
In a traditional “single node” database system, which consists of a single executing process and associated storage media, any full backup thereof represents a single consistent state of the database. A distributed database, on the other hand, consists of two or more nodes, each of which consists of a single executing process and associated storage media. The data stored in the storage media of all the nodes, taken together, represents the full database.
If each node of a distributed database is backed up as described above with respect to a single node database system, the backup of each node will represent a single consistent state of the node. Even if the backups of each node are commenced simultaneously, the backups of all the nodes will most likely not correspond to a single consistent state of the full database due to ongoing database transactions and a lack of synchronization between the nodes. Therefore, in order to ensure that the backups of all the nodes correspond to a single consistent state of the full database, each node of the distributed database must be stopped, and, after all nodes are stopped, each node is backed up. Each node is restarted only after the backup of all nodes is complete.
The full database is unavailable during the latter backup procedure described above. This downtime is significant and unacceptable in many scenarios. Systems are desired to backup distributed databases in an efficient manner which limits database downtime.