A cluster database system comprises multiple nodes, each of which executes one or more database server instances that share the same storage where database files reside. Each instance reads and modifies data blocks in the instance's own memory cache and synchronizes reads from and writes to the shared storage with other instances through a synchronization mechanism. Changes to data blocks are made within transactions that read and modify the data blocks. Transactions generate redo records (collectively “redo”) for changes made to data blocks. A single redo record may indicate one or more changes. Each instance causes redo records to be written to one or more durable log files that are separate from the log files of each other instance. A data block is written from volatile memory to the shared database after the redo up to and including the last change to the data block is written to storage in order to guarantee that the database does not contain changes not reflected in the redo.
Redo for changes that are made to a data block follow the order in which the changes are made on different instances in a cluster. The order may be guaranteed by assigning a global (i.e., for all the instances in the cluster) sequence number or timestamp for each change, which is also contained in each redo record. During recovery, redo log files from different instances are merged based on the respective timestamps of the different redo records to create an ordered redo stream for each data block.
One approach to enforce the order of redo for a data block is to delay the inter-instance transfer of a data block until all redo for that data block is written to persistent storage. This approach guarantees that if the source instance crashes after the data block is sent, there would be no lost redo that comes before the redo for changes made on the destination instance.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.