This invention relates to a computer system configured to store data in a plurality of servers and replicate the data to ensure availability.
In a distributed data management system, e.g., a distributed in-memory key value store (KVS), in order to prevent data from being lost when a failure occurs in a server, the data is stored in a plurality of servers in a distributed manner to ensure availability.
There is known a method of using a distributed consensus algorithm (for example, U.S. Pat. No. 5,261,085 B2) to guarantee consistency of data in replication for storing data in a plurality of servers. In a Paxos algorithm (hereinafter referred to as “PAXOS”) disclosed in U.S. Pat. No. 5,261,085 B2, original data is stored in a master computer as a master, and replicated data is handled as a slave and is stored in a plurality of slave computers.
In PAXOS, the following expression is satisfied in order to guarantee the consistency of the replicated data.(number n of processes)=2f+1where f represents the number of pieces of data to be replicated, and the number n of processes represents the number of computers storing the data. According to the above-mentioned expression, communications need to be conducted between the (master and slave) computers at least twice, and a number e of allowable failures is smaller than n/2. The number e of allowable failures represents the number of processes (computers) that can maintain a minimum number of times of communications (latency) even when a failure occurs. Further, the latency is set as a minimum number δ of times of communications exhibited after a client requests the master computer to update (or refer to) the data before the consensus is reached on the slave computer (consistency of the data is guaranteed).
In Paxos, a failure and a delay in a part of the slave computer storing the slave can be concealed, but a failure or a delay in the master computer cannot be concealed. Hence, there is a problem in that, in the distributed data management system that demands a low latency at all times, the latency increases due to an increase in the number of times of communications at an occurrence of a failure.
In view of the foregoing, there is proposed a technology for eliminating a master-and-slave relationship, transmitting data to respective computers, transmitting and receiving the data received by the respective computers to/from one another, and determining a degree of an identity of a value of the data transmitted and received by the respective computers to/from one another, to thereby ensure the latency while guaranteeing the consistency (for example: Francisco Brasileiro, Fabiola Greve, Achour Mostefaoui, and Michel Raynal, 2001, Consensus In One Communication Step, “Parallel Computing Technologies”, pp. 42-50, Springer Berlin Heidelberg; and Michael Ben-Or, 1983, Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols (Extended Abstract), PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing: pp. 27-30).