Generally, a database is an organized collection of data objects (e.g., data tables, data records, files, etc.) recorded in a storage medium in a systematic way for access by a computing system. Each data object can include one or more data records typically organized as a set of keyed data elements or values to facilitate retrieval and sorting. A database can generally be described structurally by a schema, which specifies the types and relationships of data objects in the database. A program, such as a database management system (DBMS), can query the database to access specific data objects.
Replication allows a single database of one storage medium to be copied to a second storage medium, effectively creating a separate instance of the original database. The copy of the database on the second storage medium is termed a “replicated database”. The original database and the replica (collectively termed “replicas”) share all or part of a common schema, including common data objects, key-value pairs, and other identifiers and structures, to couple the related replicas together. The shared schema allows identical queries to be used with both replicas. Furthermore, each replica can be modified locally and then updated with other related replicas through a process termed “synchronization”. With synchronization, changes to data objects in one replica are recorded and propagated as synchronization update to other related replicas, where the same changes are executed on the corresponding data objects of the related replicas.
However, a conflict can occur when corresponding data objects in separate replicas are modified concurrently. Generally, a conflict represents a modification of a data object at one replica A, followed by a modification of the corresponding data object at another replica B, before information of the change at A has been propagated through a communications link to B. For example, the price of a product might be changed to $X in one replica and the price of the same product might be changed to $Y in another replica at about the same time (e.g., prior to the synchronization of each change with the other replica), thereby presenting a conflict between the two replicas relative to the price of the product. To bring the replicas back into a consistent state, the conflict can be detected and then resolved.
Different strategies exist to detect conflicts. For example, the time at which a replica has been modified can be tracked and propagated to other replicas during synchronization. Upon receipt of a synchronization update, each receiving replica can detect the conflict by comparing the change time for the data object at the remote replica and the change time for another corresponding data object (whether local or remote).
After a conflict is detected, the distributed database system can work to resolve the conflict. Different strategies also exist to resolve such conflicts. For example, conflicts can be resolved algorithmically, through user interaction, etc. However, due to the distributed nature of a replicated database system and the asynchronous nature of synchronization, competing conflicts in corresponding data objects can be detected concurrently at different replicas. Detection of competing conflicts can therefore trigger and propagate competing conflict resolutions throughout the distributed database, thereby introducing new conflicts. As such, existing database servers tend to stop database processing during conflict resolution, so as to avoid such competing conflict resolutions. Moreover, competing conflict resolutions are particularly challenging for “non-idempotent” conflicts (i.e., where different replicas would resolve the conflict differently or when a resolution would ripple significant changes throughout the distributed database). Accordingly, the problem of how to handle conflict resolutions at different replicas in a distributed database without stopping other database processing becomes relevant.