For databases, media recovery is often an essential component to minimize potential downtime and provide the highest database availability. For databases, backups are generally scheduled periodically, with change records recorded for any database changes that occur between the backups. Besides the traditional application of restoring a failed or corrupted primary database, media recovery of database backups may also be applied to a separate database, allowing the primary database to be replicated into standby, failover, and test databases. The performance of the media recovery may thus have a direct impact on query latency and database availability.
Safeguards should be provided so that the media recovery process itself is protected from failure. For example, an unexpected crash or failure may occur during the application of change records in the media recovery process. Unless there is a prior known consistent state of the database, the media recovery process will need to restart from the beginning with the backup files. This restarting may be a very expensive operation, particularly for databases that have a large number of change records to process, as is the case for multi-node or multi-instance databases.
Periodic checkpointing may be used to safeguard the media recovery process, allowing the media recovery process to resume from the last checkpoint rather than from the backup files after a failure occurs. To minimize the amount of work that needs to be repeated, more frequent checkpoints are required. However, more frequent checkpointing incurs significant I/O and processing overhead, slowing down the media recovery process and negatively impacting database performance.
This processing overhead is especially acute when the standby database is applying redo at a high rate. For example, if the standby database has failed for a period of time and is now brought back online, it will receive and process a large batch of redo records from a primary database. The checkpointing process may consume large amounts of resources to keep up with the redo, which may starve other important processes such as a read-only standby database.
Accordingly, to spread the checkpointing load over time, incremental checkpoints can be used to continuously write dirty buffers. However, it is difficult to reliably determine an optimal resource allocation for the periodic, incremental, or periodic and incremental checkpoints. While a simple approach may adjust the checkpointing rate inversely with the apply rate, this has the undesirable effect of delaying checkpoint creation when it may be needed the most.
Based on the foregoing, there is a need for a method to provide efficient and high performance checkpointing for databases during media recovery.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.