The invention relates generally to electronic databases. More particularly, the invention relates to backup and restoration of data in such databases.
Database systems often perform backup and restore operations to provide a safeguard for protecting critical data stored in databases. Backing up and restoring a database allows for the complete restoration of data over a wide range of potential system problems, including media failure, user errors, or loss of database servers. In addition, backing up and restoring databases is useful for other types of problems not related to the system itself, such as moving or copying a database from one server to another. By backing up a database from one computer and restoring it to another, a copy of a database can be made quickly and easily.
Backup operations can be performed, for example, as database backups or transaction log backups. Other types of backup operations include file backups, differential file backups, and differential backups. Backing up a database makes a copy of the database that can be used to restore the database if it is lost. Everything in the database is copied, including any needed portions of the transaction log. The transaction log is a serial record of all the modifications that have occurred in a database and includes information as to which transaction performed each modification. The transaction log is used during restore operations to roll forward completed transactions and to roll back or undo uncompleted transactions.
By contrast to a database backup, backing up a transaction log backs up only the changes that have occurred in the transaction log after a prescribed synchronization point. For database backup operations, this synchronization point might occur after data is copied from the database files, but before copying the portion of the transaction log that is needed to provide a transactionally consistent view of the data that was copied from the database files. For log backup operations, the synchronization point might occur before the log is copied to the backup media, i.e., roughly the start of the log backup operation. Thus, while a database backup records the complete state of the data in the database at the time the backup operation is completed, a transaction log backup records only the state of the transaction log at this synchronization point.
Other backup operations include differential database backups, which only copy those database pages that have been modified after the last full database backup, as well as the portion of the transaction log that is needed to roll it forward and perform undo operations for transaction consistency. Like transaction log backups, differential database backups improve recoverability by reducing the amount of data at risk for loss in the event of failure. Moreover, the amount of time involved in performing a restore operation is reduced relative to full database backups. Unlike transaction log backups, however, differential database backups do not necessarily allow restoration to the exact point of failure. Restoration can only be performed up to the point in time at which the differential database backup was created. Thus, differential database backups are often supplemented by subsequent transaction log backups. Still another type of backup operation is a file or filegroup backup, which allows the recovery of just the portion of a database that was on a disk that failed.
A restore operation involves the application of a backup set to a database. Restoring a database backup returns the database to the state in which it was when the backup was created. Any incomplete transactions in the database backup are rolled back to ensure that the database remains internally consistent. Incomplete transactions include any transactions that were not complete as of the above-described synchronization point. Restoring a transaction log backup reapplies all completed transactions that are in the transaction log to the database. When applying a transaction log backup, the transaction log is traversed, and all transactions in the log are rolled forward. When the end of the transaction log is reached, the database is restored to the state in which it was when the transaction log backup operation began. The restore operation then rolls back all transactions that were incomplete when the backup operation started.
Database backups and transaction log backups are advantageously used together to restore a database to the point in time at which a failure occurred. Loss of data due to the failure can be greatly reduced or even eliminated entirely. In certain situations, using both database and transaction log backups is highly desirable. For example, the practice is advisable in any situation in which any loss of changes after the last database backup is unacceptable. The use of transaction log backups is also indicated when the resources involved in performing only database backups are limited. In addition, transaction log backups are advantageous in cases in which it is desirable to return the database to some point in time before failure.
In addition, it is also advisable to use transaction log backups in cases in which changes to the database are frequent. When a large number of changes occur to the database over a relatively short period of time, the last database backup can become outdated quickly. Because transaction log backups typically use fewer resources than database backups, they can be created more frequently than database backups. Thus, the window of time in which a failure can occur after a backup is reduced, also reducing the amount of data that is potentially lost. Further, by applying transaction log backups, the database can be recovered to a specific point in time before a failure. This point in time need not be immediately before the failure.
To restore a database from both a database backup and one or more transaction log backups, the most recent database backup is typically restored. Next, the transaction log backups that were created after the most recent database backup are applied in the same order in which they were created. Although the use of transaction log backups increases recoverability, creating and applying them is also more complex than using database backups alone. Restoring a database using both database and transaction log backups works only if there is an unbroken sequence of transaction log backups after the last database or differential database backup.
One difficulty encountered in the context of backup and restore operations is the possibility of database corruption in certain situations. Each time a new database is created by recovering the database through certain restore operations, a divergent path known as a recovery fork is created. For example, in the case of a database restore operation followed by one or more log restore operations, all operations except the last one are performed without the option of recovery and thus do not result in the creation of a new database. The last log restore operation, however, is performed with the option of recovery and effectively results in the creation of a new database, and thus of a new recovery fork. Performing a restore operation with the option of recovery essentially results in the creation of a new database and, thus, a new recovery fork. If a backup set is applied to a database from a different recovery fork than the fork on which the backup set resides, corruption may result.
FIG. 2 depicts an example backup and restore process in which multiple recovery forks are generated, and in which the database would be corrupted. In FIG. 2, the letters xe2x80x9cAxe2x80x9d and xe2x80x9cBxe2x80x9d represent the recovery fork on which the database resides at the beginning of the process illustrated and described at that particular point in time. First, at a time 202, a full database backup is made. For reference purposes, this backup is labeled xe2x80x9c1.xe2x80x9d At this point, there is only one recovery fork, referred to as fork xe2x80x9cA.xe2x80x9d Subsequently, transaction log backups labeled xe2x80x9c2xe2x80x9d and xe2x80x9c3xe2x80x9d are made at times 204 and 206, respectively.
After the transaction log backups are created, a restore operation is performed by applying the full database backup xe2x80x9c1xe2x80x9d at a time 208. This restore operation is performed without the option of recovery. Next, at a time 210, the transaction log backup xe2x80x9c2xe2x80x9d is applied to the database with the option of recovery. As discussed above, because this restore operation is performed with the recovery option, a new divergent fork xe2x80x9cBxe2x80x9d is generated when the transaction log backup is applied. Recovery fork xe2x80x9cBxe2x80x9d includes the transaction log backups labeled xe2x80x9c4xe2x80x9d and xe2x80x9c5xe2x80x9d on FIG. 2, which are created at times 212 and 214, respectively.
Subsequently, restore operations are performed at times 216, 218, and 220. At time 216, the full database backup xe2x80x9c1xe2x80x9d is applied to the database to restore the database to its state at time 202, when the full database backup xe2x80x9c1xe2x80x9d was created. The transaction log backups xe2x80x9c2xe2x80x9d and xe2x80x9c3xe2x80x9d are then applied, bringing the database to its state at time 206, when the transaction log backup xe2x80x9c3xe2x80x9d was created. Because the transaction log backup xe2x80x9c3xe2x80x9d is associated with the recovery fork xe2x80x9cA,xe2x80x9d the database is said to be in the recovery fork xe2x80x9cA.xe2x80x9d All of these restore operations are performed without the option for recovery, and no additional recovery forks are created.
Next, at a time 222, the system attempts to apply the transaction log backup xe2x80x9c4xe2x80x9d to the database. Because the transaction log backup xe2x80x9c4xe2x80x9d is associated with the recovery fork xe2x80x9cB,xe2x80x9d it is essentially from a different database than the current database, which is in the recovery fork xe2x80x9cA.xe2x80x9d As a result, applying the transaction log backup xe2x80x9c4xe2x80x9d would result in database corruption. Specifically, the system might find that the database is not rolled forward or back far enough to apply the log. Alternatively, an operation might be erroneously asserted during the rollforward or rollback process. Another possibility is that the data itself would become invalid.
Unfortunately, some conventional database systems cannot reliably detect this condition, and data can be corrupted as a result. Moreover, this potential for data corruption is not limited to transaction log restore operations. Differential restore operations and file restore operations also raise this possibility. Accordingly, a need continues to exist for a backup and restoration process that avoids the application of backup sets to databases for which they are invalid, thus eliminating the likelihood of database corruption.
Various implementations of the present invention prevent certain restore attempts that would corrupt a database. In particular, the systems and methods of the present invention reject attempted restore operations involving a backup set that was generated from a different recovery fork than the fork on which the database currently resides. To prevent such backup sets from being applied, a partial history of the database""s recovery path is maintained in both the database and backup sets. When a restore operation is attempted, it is permitted only if the backup set involved is on the database""s current fork or one of its descendants.
One particular implementation of the present invention is directed to a computer-implemented method for restoring information to an electronic database from a sequence of at least one backup set. A history of one or more backup operations performed on the electronic database, which is associated with a recovery fork, is maintained. When a restore operation using the backup set is attempted, it is intercepted. It is then determined whether the backup set is associated with the recovery fork or an descendant of the recovery fork. The attempted restore operation is permitted only if the backup set is associated with the recovery fork or an descendant of the recovery fork.
In another implementation, this history is maintained both in the electronic database and in the backup set. Each history is stored as a push-down stack of two recovery fork names each having a globally unique identifier (GUID) and a log sequence number (LSN). For purposes of this disclosure, a log sequence number is a value that is generated when a page is changed and is used to synchronize the page state with its corresponding log entries. Log sequence numbers are unique and monotonically increase within a given GUID, but not necessarily between different GUIDs. An attempted restore operation that uses the backup set is intercepted. The system determines whether a recovery fork name associated with the backup set has a GUID that matches a GUID of a recovery fork name associated with the electronic database. If the GUID of the first entry of the backup set""s recovery fork name stack matches the GUID of the recovery fork name associated with the electronic database, the backup set is compatible with the database. If a match is found in a different entry of the backup set""s recovery fork name stack, the LSNs of the matching recovery fork names are compared. The attempted restore operation is rejected if the LSN of the recovery fork name associated with the database is greater than the LSN of the recovery fork name associated with the backup set. The attempted restore operation is permitted if a recovery fork name associated with the backup set has a GUID that matches a GUID of a recovery fork name associated with the electronic database and the LSN of the recovery fork name associated with the database is no greater than the LSN of the recovery fork name associated with the backup set.
Yet another implementation of the present invention is directed to a computer-implemented method for creating a backup set based on information from an electronic database. A history of one or more backup operations performed on the electronic database is maintained. This history is copied to the backup set. If the backup set is to be the result of a database backup operation, the history is then updated to reflect a binding identifier of the backup set. If the backup set is to be the result of a backup operation other than a file backup operation, the history is updated. The backup operation is then performed.
Another implementation involves a computer-readable medium storing a data structure that has a data field containing data identifying a recovery fork with which an electronic database is associated. Another data field contains data representing a sequence number within the recovery fork with which the electronic database is associated. Still another implementation is directed to a computer-readable medium storing a data structure that includes a stack of recovery fork identifiers. Each fork identifier contains the two above-described data fields.
Still other implementations include computer-readable media and apparatuses for performing the above-described methods. The above summary of the present invention is not intended to describe every implementation of the present invention. The figures and the detailed description that follow more particularly exemplify these implementations.