In the computer industry, it has long been recognized that data stored within a computer system's mass storage sub-structure, such as a hard disk drive, should be “backed up,” meaning a copy of the data is made on a regular basis, in order to prevent the loss of that data should the computer system malfunction, “crash” or otherwise become inoperative or unavailable. Early in the field of database systems, data files were stored on magnetic hard disks, providing relatively fast random access, and were then regularly backed up to magnetic tapes, a medium which provides slower sequential access but which can be used to store data densely and inexpensively. These removable magnetic tapes permit archived data to be moved to another location for safekeeping or for later loading to a different computer system. In some cases, the process of backing up a database may be referred to as dumping the database or a database dump. The process of recovering the database from a backup copy may be referred to as loading the database or a database load. In some cases, a database backup may include a creating a backup copy of the entire database, which may be referred to as performing a full database backup. In other cases, a database backup may include creating a backup copy only of the changed or modified pages since the last backup, which may be referred to as performing a cumulative backup.
Traditionally, such backups have been performed on a regularly scheduled basis, with separate archives being created, for example, monthly, weekly, daily or even hourly. However, the timing of such backups has had to be coordinated with the actual usage of the computer system being backed up, since traditional backup methods required that a backup cannot be performed at the same time that the computer system is also being used for data processing.
When computer database programs were designed for operating primarily in a batch processing mode, on a large mainframe computer, such backups were readily scheduled and easily performed, since users did not enjoy a continuous interactive involvement with the computer system. However, with the development of time sharing systems and other “transactional” database systems, including those found on personal computers, users now expect to interact with computer systems “on-line” and in real time, creating procedural difficulties for the process of backing up data. Some of these difficulties arise from the fact that in order to back up a database or other information stored in a computer system, and particularly when the data to be backed up is of a transactional nature, data should not change or be modified at some point in time just prior to performing the backup in order to maintain the integrity of the database and eliminate the possibility of losing data which may change during backup.
Simple solutions to resolving this problem include preventing access to the database while it is being backed up. However, such a technique may be disadvantageous to the user because the user is unable to access the database while it is being backed up, effectively taking the database “off line” and creating an inconvenience, which may be inefficient. Such a technique may also create problems regarding data integrity when, for example, a transaction is in progress but has not yet been committed to the database when a backup begins.
Current database backup techniques may be fully online, meaning that a user may still have full access to the database while a backup is being performed.
In at least some cases, part of the process for performing a database dump (both full and cumulative) includes the database being checkpointed, meaning a marker is placed in the database that records a timestamp such that any changes made to the database thereafter are not intended to be recorded by the initial database dump. Next, pages may be copied from the database to the archive by backup server. During the time when pages are being copied, changes may continue to be made to the database by concurrent processes, including users modifying pages of the database. When these pages are copied from the archive to the database by load, the pages have an image at least that of the database when it was checkpointed, but the changes subsequent to this may be (and are likely to be) missing. In some cases, these changes are restored when loading the database by having copying the transaction log of the database to the archive and then using the changes recorded in the transaction log to restore the database to the state that it was at the dump instant.
The longer it takes for pages to be copied from the database to the archive, and the more concurrent activity there is during this time, the larger the amount of recovery the load must perform. For very active very large databases (VLDBs), the time that it takes to recover a database is considerable, and thus the time taken to copy pages from the archive to the database becomes only a fraction of the total restore time.