A database management system (DBMS) may have multiple server instances for a same database. For example, sharding, replication, and horizontal scaling are topologies that may utilize multiple server instances for a database.
Typically each server instance occupies a separate host computer, such as a physical or virtual machine. Server instances may exchange data content and control information over a computer network. For example, server instances may collaborate to answer a federated query, to synchronize replication, and to rebalance data storage demand.
Assuming that failure rates of server instances are additive with horizontal scaling, it is axiomatic that the mean time between failure within a cluster or other federation of server instances will decrease as the cluster grows (gains server instances). As such, a robust cluster should tolerate crashed server instances and allow them to be rehabilitated and return to service by rejoining the cluster.
Rehabilitation of a server instance may entail recovery (logical repair of corrupt files), replay of redo logs to apply committed transactions that were inflight during the crash, and restarting the server instance.
A redo log may consist of multiple files that are pre-allocated, rotated, and that store digests of changes made to a database more or less as soon as they occur. Redo log-files are filled with redo records.
A redo record, also called a redo entry, is made up of a group of change vectors, each of which is a description of a change made to a data block in the database. For example, changing a salary value in an employee table may generate a redo record containing change vectors that describe changes to a data block for a table. A redo record represents a database write, regardless of whether the enclosing transaction of the write has or has not been committed.
A typical high-availability database configuration consists of one primary (production) database and one or more standby databases. In operation, this configuration typically uses redo log replication.
The primary database may use synchronous and asynchronous transport mode for redo log replication. The prior industry solution uses synchronous transport mode to achieve zero data loss, also known as no data loss (NDL).
Synchronous redo transport mode transmits redo data to the standby databases synchronously at more or less the same time when the same redo is persisted to the online redo logs of the primary database. Synchronous redo transport mode is required to guarantee zero data loss in case of situations where the primary database suffers a crash from which it cannot recover.
However, using synchronous redo shipping during normal activity at the primary database can impact performance of the primary database, because the process that writes redo to the online redo logs at the primary database also ships the same redo to the standby database. For example, transaction commit latency may increase and overall transaction throughput may decrease.
Asynchronous redo transport mode transmits redo data asynchronously, after it has been persisted to the online redo logs at the primary database. As such, it is possible for transactions to commit at the primary database, but the redo generated by that transaction may not be available at the standby database.
Asynchronous redo transport mode does not impact the primary database performance. However, it does not guarantee zero data loss in case of a disaster at the primary database.