A typical database cluster includes a plurality of database servers that communicate with each other for data replication and data synchronization purposes. A conflict of synchronization may arise when two or more database servers execute different transactions at a resource identified by the same key substantially concurrently. The phrase “substantially concurrently” may be defined as a situation where a first transaction is executed at a first database server and a second transaction is executed at a second database server, and the timing of both the executions is such that the first database server does not have knowledge of the second transaction and/or the second database server does not have knowledge of the first transaction.
A conventional technique of detecting conflicts of synchronization is provided in “The Database State Machine and Group Communication Issues”, a thesis by Fernando Pedone (1999) at Ecole Polytechnique Federale de Lausanne. The thesis is incorporated herein by reference in its entirety.
The conventional technique involves checking whether one or more keys associated with the first transaction matches with one or more keys associated with the second transaction. If it is found that there are one or more matching keys associated with both the first transaction and the second transaction, the first transaction and the second transaction are said to have a conflict of synchronization. In such a situation, at least one of the first and second transactions is rolled back.
In order to illustrate the conventional technique, let us consider a few examples. In a first example, let us consider that the first transaction modifies resources identified by keys ‘a’, ‘b’ and ‘c’, and is represented as ‘T1(a, b, c)’. Let us also consider that the second transaction modifies resources identified by keys ‘d’ and ‘e’, and is represented as ‘T2(d, e)’. In the first example, there are no matching keys between the first transaction and the second transaction. Therefore, the first transaction and the second transaction do not conflict with each other.
In a second example, let us consider that the first transaction modifies the resources identified by the keys ‘b’ and ‘c’, and is represented as ‘T1(b, c)’. Let us also consider that the second transaction modifies the resource identified by the key ‘b’, and is represented as ‘T2(b)’. In the second example, the key ‘b’ is associated with both the first transaction and the second transaction. Therefore, the first transaction conflicts with the second transaction.
The conventional technique works well as long as these keys identify resources that are actually being modified by the first and second transactions. However, relational databases introduce “relations” between resources. A resource modified by a transaction may depend on or be a dependency of another resource, which might be modified by another transaction.
In order to detect conflicts arising from concurrent modification of related resources by unrelated transactions, each transaction must reference keys identifying a modified resource and other resources on which the modified resource depends. For example, if a resource ‘R1’ depends on a resource ‘Rp’, then a transaction that modifies the resource ‘R1’ should reference keys associated with both the resources ‘R1’ and ‘Rp’. Accordingly, the transaction may be represented as ‘T(keys(R1), keys(Rp))’. This may enable detection of a conflict with another transaction that modifies the resource ‘Rp’ substantially concurrently.
However, this often leads to excessive false positives. For example, when two transactions modify two unrelated resources ‘R1’ and ‘R2’, both of which depend on a common unmodified resource ‘Rp’. The two transactions may be represented as ‘T(keys(R1), keys(Rp))’ and ‘T(keys(R2), keys(Rp))’. As the keys ‘keys(Rp)’ are associated with both the transactions, a “false positive” conflict of synchronization is detected.
Therefore, there exists a need for a method for use in a database cluster that is capable of significantly reducing occurrences of false positives during synchronization of data within the database cluster.