Updates to databases are often made by changing data stored in dynamic memory and then writing the changed data to disk at a later time. However, in every database system, the possibility of a system or hardware failure always exists. Such failures can corrupt or destroy changes made to data in dynamic memory before the changed data has been written to disk, leaving the database in an inconsistent state. Even after changed data has been written to disk, media failures can corrupt or destroy portions of a database containing the changed data.
To address the risk of losing changed data not yet written to disk, some contemporary databases maintain a recovery log containing a record of all changes made against the database. The recovery log typically consists of one or more files stored on disk which contain sufficient information about the changes so that in the event of a failure, the changes that were lost during the failure may be made against the database again. Hence, the recovery log provides a recovery mechanism for restoring the consistency of a database in the event of a failure.
Consider the simple database arrangement 100 depicted in FIG. 1. A first client application 101 and a second client application 102 submit changes to a database system 103. The database system 103 includes a database server 104 and non-volatile storage 105. The database server 104 processes database changes submitted by the first and second client applications 101, 102 and accesses non-volatile storage 105 which stores the database files. The database system 103 also includes a recovery log 106 which resides on non-volatile storage 105 and contains sufficient information about all of the changes submitted to the database system 103 by the client applications 101 and 102, so that in the event of a failure, the changes may be resubmitted from recovery log 106.
Because a failure may occur at any time, it is not known which changes have actually been written to non-volatile storage 105. Therefore, during recovery, data blocks on the non-volatile storage 105 must be checked to determine whether the data block reflects the changes recorded in the recovery log 106. According to one approach, this determination is performed by reading a version identifier that indicates the stored version of each referenced data block and comparing the version identifier to a corresponding version identifier stored in the recovery log 106. If the version identifier associated with the change contained in the recovery log 106 is newer than the version identifier associated with the data block stored on non-volatile storage 105, then the change was never applied to the data block stored in the non-volatile storage 105 and must be reapplied. On the other hand, if the version identifier associated with the data block stored on non-volatile storage 105 is at least as recent as the version identifier associated with the change contained in the recovery log 106, then the change does not need to be reapplied.
For changes affecting many data blocks, this process becomes quite time consuming. Moreover, the changes are stored in the recovery log 106 in chronological order. Accessing the data blocks in the database files based upon the chronological order of the changes in the recovery log 106 results in random disk I/O, which is relatively inefficient due to the amount of seek time consumed during the read. This is because the data blocks are sometimes written in different orders.
In view of the need to reapply changes to a database after a system or hardware failure, and the limitations associated with existing approaches, a method and apparatus for reapplying changes to a database to further reduce database recovery time is highly desirable.