1. Technical Field
The present teaching relates to methods, systems, and programming for resolving data inconsistency. Particularly, the present teaching is directed to methods, systems, and programming for resolving data inconsistency in a distributed system having a plurality of replica instances.
2. Discussion of Technical Background
Distributed computing/storage is a field of computer science that studies distributed systems, which include multiple autonomous computers or parallel virtual machines that communicate through a computer network, such as one or more computer clusters each having multiple nodes. Distributed systems and applications may be applied as various paradigms, including grid computing, utility computing, edge computing, and cloud computing by which users may access the server resources using a computer, netbook, tablet, smart phone, or other device through the Internet.
Replication is one of the oldest and most important topics in the overall area of distributed systems. Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. A distributed system where data is replicated can provide better fault tolerance capabilities as well as improved response time. One of the major concerns of replication in distributed systems is data inconsistency. For example, in massively replicated distributed systems with billions of records, each with many entries, and hundreds of geographically dispersed replica instances (each within its own administrative zone), various systemic causes can lead to data divergence, where replicas of some of the records contain entries that have divergent values. Processes that lead to divergence may not always be avoidable or predictable. The inconsistency issues include, for example, missing records in some replicas where they should be represented, missing keys in some records that some of their replicas have, and keys with different values in some records when compared to some of their replicas.
Some known solutions of resolving data inconsistency are based on an anti-entropy model that delivers inconsistencies to users to resolve conflicts. The anti-entropy model, however, is not suitable for massively replicated distributed systems because of its inefficiency. Therefore, there is a need to provide a solution for ensuring that the distributed data replication system can heal the data inconsistency itself and can decrease the occurrence of divergent values within an acceptable half-life.