1. Field of the Invention
The invention relates to database access techniques, and in particular, to a method for providing read-only access to content of a database by suspending a log-based update mode in a manner that enables resumption.
2. Description of the Related Art
Availability of data in databases is of primary concern for database managers and users. The ability to access data for numerous purposes is becoming increasingly important, and there are increasing numbers of access techniques for retrieving useful information from databases.
Providing many users with access to a large database, requires equipment adapted to handle a volume of data input and output (I/O). The rate at which this I/O is provided and the number of concurrent access requests are major factors affecting a response time of the database system. Response time is another factor considered important by database managers and users.
A third concern of many database managers is disaster recovery. Many large database systems have disaster recovery systems in place to handle a primary database failure, and to provide a backup in case data is corrupted by human error, etc. Many of these disaster recovery systems require one or more backup copies of the database to be saved. FIG. 1 schematically illustrates a primary database server 10a having a buffer pool 12a, and disk storage 14a for holding the primary data; a secondary database server 10b having similar components, for storing the backup data; and a database management workstation 16, all interconnected by a packet network 18. The primary database server 10a may be a very busy entity having hundreds of users concurrently updating and retrieving data.
Generally the procedures for copying data from the primary disk storage 14a to the backup disk storage 14b is relatively expensive in terms of I/O, and in some cases, it requires exclusionary access to the primary database server 10a, resulting in a temporarily unavailable primary database. While known mirroring techniques greatly improve this process, duplication of the primary data is still a hardware-intensive process. Given the desire to minimize interruption of availability of the primary database server 10a, and occupation of the I/O hardware, it is therefore known to copy the data once, at a time=t, and then use logs of changes made to the primary database after time=t to update the backup database thereafter. The backup data is maintained concurrent with the primary data to within a guarded temporal lag. After the log of transactions effected at the primary database is created, it is forwarded to the backup database server 10b, in a procedure known as log shipping.
Log shipping is well known in the art. In accordance with common practice, every time the primary database is changed, the transaction is recorded in a log, which is usually stored for audit purposes, etc. Copies are made of this log, and shipped to the backup database server 10b. After a certain time delay, the logs are “replayed” at the backup database, unless processing is interrupted, as illustrated in FIG. 2.
As previously noted, each log record defines an action that has been applied to a database object at the primary database server 10a. A log record is retrieved (often in parallel with other log records), and is read (step 50). It is subsequently determined (in step 52) if the transaction of which the log record is a part is open. If the transaction is not open, a procedure for opening the transaction is executed (step 54). All pages that are included in the database object identified in the log record are retrieved from the default page locations to the buffer pool (unless they are currently instantiated at the buffer pool), in step 56. In step 58, the page(s) are retrieved and updated, so that the previous changes to the primary data are then made to the backup data. Once the log record replay is complete, it is determined, in step 60, whether the transaction is now ready to be committed, and if it is the transaction is completed (step 62). Actions necessary to record the effect of the log record are then effected, in step 64. In some embodiments a log sequence number (LSN) may be updated on the retrieved page(s) so that the page indicates a last completed transaction applied thereto, a completion log may be updated and the status of the page may otherwise be updated. If another record exists in the update log, this procedure is repeated by returning to step 50.
At any time during this replay process a stop command may be issued (step 66). The stop command may be issued in response to an error condition, a failure, a user input from the management workstation 16, or a programmed response. For example, if the log is empty, or a database manager issues a stop command, the process will halt. If the log is empty, the stop command will likely be issued after step 56, or after step 50, depending on program implementation.
These log record updates may be made in parallel, using known concurrent processing techniques. Consequently many (active) transactions may be ongoing at any given time. When a stop command is issued, in step 66, all active transactions are halted. Each of the active transactions is rolled-back (step 68), in a manner well known in the art. Basically, rolling back a transaction involves undoing changes to the pages that the transaction's log records affect.
As each transaction is either rolled-back or committed, the database after the rollback procedure of step 68, is in a consistent state. In step 70, abort logs and compensation logs are appended to the transaction log, overwriting any log records following a current log record. As will be appreciated by those skilled in the art, compensation log records are written to indicate the actions taken to undo the changes to the pages effected by the log records of active transactions. Abort records are written to indicate which log records have been stopped.
The backup database is now taken out of update mode, and can be accessed. It will be appreciated by those skilled in the art that the data in a consistent state can be used for many purposes. More particularly the backup data may be used for any procedure that requires a substantial amount of I/O resources, and therefore cannot be performed at the primary database server 10a without incurring unacceptable performance penalties. For example, generating month-end forms often requires collation of a substantial amount of data, as does creating charts and performing data manipulation operations to generate desired output. Month end forms are much more expeditiously generated using a copy of the database than by retrieving the data piecemeal, and reassembling the data at a workstation. Copies of databases in the consistent state may also be used for myriad database analysis procedures, for model and system testing and design, etc.
Once the work has been completed, or desired parts of the backup data has been copied to a third location where the work will be completed, the backup database is returned to service as the backup to the primary data. In accordance with known techniques, the current backup database is erased, and a new copy of the primary data is taken at time=t, and copied to the disk storage 14b (step 72). The records of transactions applied to the primary data after time=t are then forwarded to the backup database. The backup database may therefore be returned to the log-based update mode (step 74), and thus receive the records of transactions, and continues replaying the data as before.
As will be appreciated by those skilled in the art, recopying the primary data to the backup system is a time consuming operation that requires considerable I/O, resources (and network bandwidth when the secondary is not directly connected to the primary). Nonetheless, current highly robust methods for updating data do not terminate in a manner that permits the recommencement of the update process because versions of changed and undone pages that are needed for recovery, and the log records themselves, cannot be recovered. Accordingly there exists a need for a method and system for suspending a log-based update of a database, that permits resumption of the update process.