A typical database cluster includes a plurality of database servers, which are often distributed geographically. In the database cluster, the database servers communicate with each other for data replication and data synchronization purposes. The term “data replication” typically refers to electronic copying of data from one computer or server to other computers or servers. Data replication and data synchronization enable users to access a same level of information and to access data relevant to their tasks without interfering with tasks of other users.
A data replication process can be complex and time-consuming, for example, depending on a size and a number of distributed database servers. Moreover, the data replication process can also be demanding with respect to time and computing resources.
In particular, several problems may be encountered when a size of a transaction grows over a certain limit. These problems will be partially different depending on a type of a database cluster, for example, such as a master-slave database cluster versus a multi-master database cluster, or an asynchronous database cluster versus a synchronous database cluster. The problems are most pronounced in case of a synchronous multi-master database cluster that employs optimistic concurrency control, namely a certification-based replication system.
Some example problems that may be faced by a typical certification-based replication system are provided below. Firstly, a sheer size of a transaction will mean that the replication system has to transfer more data. In this regard, communicating a large transaction to slave nodes over transport media will require a larger memory allocation and will take a longer time. Secondly, a size of metadata for the transaction, for example such as row identifiers, will also be large. During certification, the replication system has to manage a larger certification index, which, in turn, translates into a larger memory allocation for the certification and a longer certification time. Thirdly, a transaction-processing time in a master node will be longer. As a result, the transaction will lock a large number of rows for a longer time. This often causes vulnerabilities for multi-master conflicts, when another transaction that is processing in another node writes to one or more same rows at a same time. Fourthly, applying of the transaction in the slave nodes will take a longer time. This can cause a bottleneck in the slave nodes. Moreover, the replication system may have to wait for the large-sized transaction to apply to the slave nodes completely.