This invention relates generally to computer systems, and more particularly to methods and apparatus for backing up and restoring data.
To ensure data integrity and persistence, computer databases are backed up on a routine basis. Backing up large databases can, however, be lengthy and time-consuming.
At least one known database system attempts to reduce the resources needed for backing up databases by a system using full backups and partial backups. A full backup is performed at a selected interval. Between full backups, one or more partial backups are performed at more frequent intervals. For example, consider a system in which a full backup is performed every seven days, on successive Sundays, and a partial backup is performed each day from Monday through Saturday. Let us also suppose that 10% of the records in the database change each day, on average.
Partial backups can be incremental or cumulative. An incremental backup only requires a backup of records that change between successive backups. A cumulative backup requires a backup of all records that have changed since the last full backup. Statistically, substantially more memory space on whatever backup medium is used (tape, disk, etc.) as well as time is needed to perform each successive cumulative backup between pairs of full backups. On the other hand, incremental backups statistically do not require an increasing amount of space and time to perform.
Whether incremental or cumulative partial backups are performed, this known system for backup up databases has proven quite successful for small and medium-sized databases. However, there are scaling problems associated with this backup system for large (e.g., terabyte) databases. For example, to backup a large database system requires a spike in resources each time a full backup is performed, i.e., time, space, computing power, and network bandwidth has to be available to perform the full backup. This requirement can limit large system backups to specific times during the week, such as Sundays at 4 a.m., which may be the only time enough resources can be made available without interfering with normal business uses of the database. Also, depending upon when in a backup cycle a restore has to be done, restoring a system from backups may require a tedious process of restoring the last full backup and all incremental backups since the last full backup, or the last full backup and a (possibly very large) incremental backup. Moreover, if a network computer performs backups for several large databases on separate backup schedules, a large peak resource requirement may have to be provided for those times when full backups have to be performed for more than one of the databases.