Database systems may be built either as a stand-alone system utilizing hardware resources of only one computer system (typically consisting of CPU, communication means and disk system) or as a fault-tolerant system where fault-tolerance is achieved by the computer system redundancy.
To achieve redundancy, a copy of the data managed by the database system needs to be maintained on at least one backup computer system.
Data redundancy in a fault-tolerant database system is typically achieved with data replication where all transactions that are executed in the primary database, are also executed in the secondary database(s). The replication may be synchronous or asynchronous. In synchronous replication, the transaction is replicated to the secondary database as it happens in the primary database. In practise, the primary server must wait for an acknowledgement from the secondary database before it can report successful transaction execution to the client application. This replication method is also known as 2-safe replication. In asynchronous replication, the primary server may send data to the secondary server and continue its operation without receiving acknowledgement about receipt of the transaction(s) from the secondary server. This replication method is also known as 1-safe replication.
Another essential property of a data management system is data persistence. It is typically achieved by writing the committed transactions to a transaction log that typically resides in a non-volatile memory such as disk drive. The performance characteristics of a disk drive are such that writing one byte of data to the disk is about equally expensive in terms of performance than writing a larger chunk, e.g. 8 kilobytes. In other words, the number of write operations has higher impact on the performance than the volume of data to be written. To maximize the data durability, all transactions must be written to the disk as they occur. However, as explained above, such disk use may be non-optimal in terms of performance. To maximize the speed of the transaction log write operations, the number of disk write operations should be minimized and each write operation should write larger chunk of transactions' data to the disk. This can be achieved by buffering the data write operations in volatile memory and trigger the actual disk write operation after enough data has been accumulated or enough time has elapsed since the previous disk write operation. The down side of transaction log buffering is that committed transactions that have not yet been written to the disk, are lost in exceptional situations, for example if the database server process fails abruptly or the server hardware loses power.
The database servers of the fault-tolerant database system may have different states depending on what is the status (availability state) of the replication in the system. In this document, following exemplary states are used:                PRIMARY ACTIVE—server availability state, where primary server is connected to the secondary server and is able to send transactions to it. All transactions are committed in both primary and secondary servers.        SECONDARY ACTIVE—server availability state, where secondary server is connected to the primary server and is able to receive transactions from it. All transactions are committed in both primary and secondary servers.        PRIMARY ALONE—server availability state, where primary server queues transactions for later sending to the secondary server. Server is able to execute the transactions but it constitutes a single point of failure.        STANDALONE—server availability state, where server is currently not part of a fault tolerant database system. Server is able to execute transactions but it constitutes a single point of failure,        
Of these states, PRIMARY ACTIVE, PRIMARY ALONE and STANDALONE are states where a DBMS may accept write transactions and are applicable in this context, PRIMARY ACTIVE is a state where the server is not a single point of failure, i.e. there's at least one another, secondary server (that is in SECONDARY ACTIVE state) available to immediately take over if the primary fails. If the state is PRIMARY ALONE or STANDALONE, then the server is a single point of failure, meaning that if the server becomes unavailable, there is no other server to immediately continue the service from the point where the failed server ended.
A typical fault-tolerant data management system of prior art consists of a primary database and at least one secondary database to which the transactions are replicated using 1-safe or 2-safe replication. If data loss in server failure situations is not acceptable, then the replication method must be 2-safe replication. For ensuring persistence, the servers may use either unbuffered or buffered transaction log write modes. However, the buffered transaction logging mode does not guarantee data persistence if the other database server of the system fails for any reason. To guarantee data persistence in all possible situations, unbuffered logging must be used. This, however, has an adverse effect on the performance in “normal use” where both servers are functioning properly. The performance difference between buffered and unbuffered disk writes may be several hundred per cent.
FIG. 1 depicts an exemplary fault-tolerant database system known from the prior art. A client application 100 is connected to primary DBMS 102 via some communication means 101, such as a network connection. The primary DBMS 102 has access to a persistent storage 103 and it is connected to a secondary DBMS 105 via some communication means 104, most typically a network connection. The primary DBMS 102 is arranged to replicate transactions to the secondary DMBMS 105 in synchronous and/or asynchronous manner. The secondary DBMS 105 has access to its own persistent storage 106.
The flow diagram of FIG. 2 illustrates the basic operations involved when executing a transaction in a fault-tolerant manner. In step 201, the primary DBMS 102 accepts a transaction from client application 100, executes 202 the transaction in the primary DBMS, replicates 203 the transaction to the secondary DBMS and writes 204 the transaction data to its persistent storage 103. It should be noted that steps 201, 202, 203 and 204 may occur also in different order than shown in the drawing and that a transaction may be accepted from the client application in multiple steps 201 and where each of the execution step 202 may involve data replication 203 and data write operations 204 to storage device.
The FIG. 3 depicts in a form of a flow diagram the method of unbuffered writing of transaction data to storage device known from prior art. The client application 100 requests 301 a transaction to be committed by the DBMS 102 or 105. To commit the transaction in the local database, the DBMS requests 302 operating system to write the data of the transaction to the storage device 103 or 106. The DBMS waits 303 until the operating system has successfully written the data to the storage device. Upon receiving 304 the success info from the operating system, the DBMS reports to client application that the transaction was successfully committed.
In FIG. 4, a buffered transaction log write method known from the prior art is explained. In step 401, client application 100 requests a transaction commit. Upon this request, the DBMS 102 or 105 writes 402 the transaction data to a volatile memory buffer and reports 403 immediately success back to client application.
The methods of the prior art described above do provide either guaranteed persistence of data in all situations via unbuffered data write operations or high performance via buffered data write operations, but they don't allow selecting the optimal method based on the current availability state of the servers.