Conventionally, an information processing system, in which data among a plurality of nodes is replicated and stored such as NoSQL represented by distributed key value stores (KVS), includes multipath replication as a method for updating replica when a data write occurs.
In this case, a node is an information processor apparatus provided with a central processing unit (CPU), a memory, a disk device and the like. Multiple nodes are interconnected over a network. A replica is the reproduction of data. The information processing system functions as a distributed storage system. Each node in the information processing system functions as a storage system for distributing and storing data.
FIGS. 12A and 12B are diagrams for explaining the multipath replication. As illustrated in FIG. 12A, an information processing system 90 includes a data center X and a data center Y. The data center X includes nodes from a node 91 that stores data of a first replica to a node 94 that stores data of a fourth replica. The data center Y includes a node 95 that stores data of a fifth replica. The node that stores the first replica is simply called a “first replica” in FIGS. 12A and 12B. Nodes for storing other replicas are represented in the same way.
When a client 81 writes data to the node 91 that stores the first replica, the node 91 sends an update request to the node 92 that stores the second replica, to the node 93 that stores the third replica, and to the node 94 that stores the fourth replica. The node 92, the node 93, and the node 94 then send update requests to the node 95.
Specifically, the information processing system 90 sends the update requests in parallel along three paths: first→second→fifth, first→third→fifth, and first→fourth→fifth, that is, in a multipath manner.
When the update requests reach the terminal node 95 from all the paths, updated requests are sent back along the three paths as illustrated in FIG. 12B. That is, the updated requests are sent along three paths: fifth→second→first, fifth→third→first, and fifth→fourth→first. Each node updates the replica for the new data included in the update request upon receiving the updated request.
A feature of the multipath replication is a mechanism for maintaining consistency of data. Consistency between data in this case signifies the fact that the same data is seen regardless of which replica is accessed. A problem in data consistency is exemplified in FIG. 13.
FIG. 13 illustrates a case where, although update requests reach the terminal through the two paths of first→second→fifth and first→third→fifth, the node 92 receives a reading request for the data from a client 82 when the update request from the remaining path first→fourth→fifth has not yet reached the terminal. In this case, the node 92 may return either the new data included in the update request or old data that has not yet been updated, to the client 82. However, data consistency implies that another node returns the same data even if another node is accessed by the client 82 in this way.
For example, if the node 95 that stores the fifth replica returns the new data, the node 92 that stores the second replica also returns the new data in FIG. 13. If the node 95 that stores the fifth replica returns the old data, the node 92 that stores the second replica also returns the old data. This type of data consistency is called strong consistency.
The multipath replication achieves the data consistency by using a version function in multipath replication (see, for example, Jeff Terrace and Michael J. Freedman, “Object Storage on CRAQ: High-throughput chain replication for read-mostly workloads”, Proc. USENIX Annual Technical Conference (USENIX'09), San Diego, Calif., June 2009). FIG. 13 illustrates a case of maintaining data consistency using the version function. A plurality of nodes in which data candidates are returned to the client 82, such as the node 92 that stores the second replica, send a version request to the node 95 that is the terminal node. Then, which data is to be sent to the client 82 is determined.
The terminal node 95 that receives the version request determines whether the reception of the update requests from all the paths and the updating of the replicas has been completed. For example, as illustrated in FIG. 13, the terminal node 95 determines that the replicas are not updated yet since the update request from the path, first→fourth→fifth, has not been received. As a result, the terminal node 95 sends a reply to the node 92 that stores the second replica to return the old data. Since data updating is conducted first by the terminal node 95, data consistency may be maintained by establishing the version of the data stored by the terminal as the base.
Japanese Laid-open Patent Publication No. 2000-242620 discloses a technique in a system having a plurality of replicating nodes in which a node mediating requests from clients among the replicating nodes replies with a scale for indicating the degree of newness of the data that is received from the replicating nodes.
However, as illustrated in FIG. 13, there is a problem that when the node 92 sends a version request to the terminal node 95 in a different data center Y, a reply with respect to a data reading request may not be made and thus the response time is delayed.
When replicas are arranged, in general various replicas are often placed in different data centers far away from each other in consideration of data distribution and enabling disaster recovery in case of natural disasters. For example, an operation is common in which the data center X is a backbone data center of a storage system in Tokyo, while the data center Y is a remote data center for disaster recovery in San Francisco. Reading performance may be improved since data reading requests are processed by being distributed among many nodes due to the increase in the number of replicas.
The system illustrated in FIG. 13 is a system for handling the above conditions and version requests are preferably not sent to a remote data center in the system illustrated in FIG. 13.