Modern computer systems typically consist of a CPU to process data, a networking interface to communicate to other computer systems, and one or more durable storage units. The system may stop processing due to power failure, program incorrectness, or a hardware fault. Such failures are often called process failures. The durable storage units are able to keep the data intact while the fault is repaired.
A set of these computer systems can be networked to form a cluster. Although the network is generally reliable, occasional faults may occur to disrupt communication between certain nodes or sets of nodes. This disruption in communication is often called a network partition.
Each of these nodes runs a transactional storage system that both reads and writes data (a database management system). Some of this data is concurrently accessed by applications operating on different nodes. To guarantee data consistency, transactional database replication techniques are used to manage and regulate access to that data. However, such conventional replication techniques are associated with a number of problems.
For instance, conventional replication systems typically require an administrator to stop all applications, then stop the databases running at each of the nodes in the replicating group, then go to each node and notify it about the change in membership (usually by editing a table), then restarting the databases on each of these machines, then restarting the applications. This necessity of having to stop the databases running in the system can be undesirable.
What is needed, therefore, are transactional database replication techniques that do not require stopping of databases during replication.