The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Data service providers often utilize distributed systems to provide data services, such as online shopping websites, social networking platforms, information services, and so forth. Commonly, these distributed systems comprise two or more geographically-separated data centers, each providing a same set of services through common application(s) and data. Distribution of the provision of data services amongst two or more geographically-separated data centers offers many advantages, including without limitation increased capacity, higher response times, and redundancy.
Each data center in such a distributed system is communicatively coupled to a common wide area network, such as the Internet. Different clients, such as personal and mobile computing devices operated by end users, are connected to different data centers. Clients are routed to different data centers as a result of any of a variety of addressing and load-balancing mechanisms. The data center to which a client connects may depend on factors such as geography, network topography, time of day, and/or and server capacity. As a result of common access mechanisms for the services provided by each of the data centers, such as the use of common uniform resource locators to reference common applications at each of the data centers, the existence of multiple data centers is typically transparent to most end users. Since each data center offers a same set of services, end users are thus generally unaware of which data center they are connected to, or even the fact that there are multiple data centers that provide the services.
There is typically a high latency between some or all of the data centers operated by a data service provider. On account of the high latency and/or other factors, it is often not practical to keep the data at each data center perfectly synchronized. For example, conventional locking mechanisms would require the exchange of several messages between some or all of the data centers before any database operation could be performed. Each message could spend on the order of hundreds or thousands of milliseconds in transit between data centers. Meanwhile, in some embodiments, a single web page generated by a data center could require hundreds of database operations. Thus, keeping the data at each of the data centers synchronized through conventional locking mechanisms can greatly slow the provision of the web page.
Instead, data at data centers is often kept loosely synchronized through the use of bi-directional replication tools. Each data center performs operations on its copy of the data without regards to the status of the data at other data centers. At some time subsequent to a particular data center performing an operation on its own copy of the data, the operation is “replicated” at each of the other data centers, thus ensuring that the data at each of the data centers is consistent with that of the particular data center.
It is of course possible that, between the time that a first data center performs a first operation and the time that the first operation is replicated at a second data center, the second data center will have performed a second operation on the data that conflicts with the first operation. While such conflicts can be “resolved” through the use of conflict detection and resolution mechanisms, the use of conflict resolution mechanisms can result in issues such as corrupted orders or lost data, and is therefore undesirable. However, until recently, in most embodiments, either the occurrences of such conflicts had typically been rare enough that the use of conflict resolution mechanisms presented a “good enough” solution, or an active/passive system is implemented in which replication is performed unidirectionally from a single active data center to one or more passive data centers.
Recently, occurrences of conflicts in replicated data are increasing, particularly as a result of an increasing amount of shared data and collaboration by users or entities operating at geographically diverse locations or at a same geographic location, but on different networks. An example of this problem involves multiple users of an online shopping site presenting the same account credentials within the same time frame but at different locations. For example, family members may share an account for a variety of purposes. Because the family members are at different locations, the family members may be directed to different data centers. If one family member changes a shopping cart or profile data while interacting with a first data center, and another family makes changes to the same shopping cart or profile data adds while interacting with a second data center, the account data may become corrupted. This and similar problems can occur even if the same user access the account data with two different devices that are in close proximity to each other, but connected to different networks. For example, a user's smart phone may be directed to a first data center because it is connected to a cell phone network, while the user's tablet is directed to a second data center because it is connected to a WiFi network These and other problems complicate the operation of distributed systems of data centers.