The present invention relates generally to the field of error detection and fault recovery, and more particularly to data processing system error or fault handling.
Replication is an approach to providing high availability and scalability of data. In a replicated service, redundancies are created by copying (sometimes also called replicating) data across various servers. That is, each server in a plurality of servers has a copy of the replicated data (sometimes also called a replica). This allows the replicated service to access the replicated data even if a subset of the plurality of servers experiences a failure. Thus, the replicated service remains operational.
However, various methods of replicating the data result in slightly different versions of the replicated data. This is sometimes called an inconsistency, a split-brain, and/or a divergence. Timing can cause some instances of a split-brain. Additionally, various replicas are unaware of a split-brain. This can result in cycles of a plurality of replicas proposing competing commands (sometimes also called dueling).