A distributed storage system (a storage system, an information processing system) is known which includes a plurality of nodes (storage devices, information processing devices) and stores data so as to be distributed in a plurality of nodes.
In the distributed storage system, for example, when a failure occurs in one node of the plurality of nodes, a client that uses the distributed storage system is difficult to access the failed node.
In addition, when the failed node is made redundant with other nodes, the client can access the redundant node instead of the failed node. However, the distributed storage system including the redundant nodes is in an unreliable state with low redundancy until the replacement of the failed node and recovery processing, which is for recovering the multiplexing state of data before the failure occurs in the node, are performed.
Therefore, in the distributed storage system, it is preferable to detect a node failure quickly by monitoring the states of a plurality of nodes.
In the distributed storage system, however, there is a case where a plurality of nodes are split from each other due to node failure or the failure of the link between nodes and one node and another node that are split from each other may make different determinations regarding the node failure. This state is called a split brain state. As an example of the split brain state, a case can be mentioned in which one node and another node are difficult to access each other due to the failure of the link therebetween but both the nodes determine that each partner node has failed.
For example, when one node and another node store redundant data of the same data, if the nodes fall into the split brain state, both the nodes may update the stored redundant data separately or may perform recovery processing on each partner node. This may destroy the consistency of redundant data.
As methods to prevent falling into the split brain state in the distributed storage system, methods exemplified below are known.
(1) Each of a plurality of nodes notifies a predetermined node (control node) of the plurality of nodes of the configuration information and survival report of the node. The control node monitors the plurality of nodes based on the information obtained from each of the plurality of nodes. When a failed node is detected from the monitoring result, the control node performs recovery processing and notifies the administrator or the like of the failure of the node.
(2) Each of a plurality of nodes exchanges its survival report with other nodes (information exchange phase), and selects which node is to perform monitoring and failed node detection by making an agreement with the other nodes. The agreed node (determined node) monitors each state of the plurality of nodes. When a failed node is detected from the monitoring result, the agreed node (determined node) performs recovery processing and notifies the administrator or the like of the failure of the node.
(3) Each of a plurality of nodes sends a survival report to a predetermined node. Since a failed node is not immediately detected by the predetermined node, the administrator or the like takes action, such as recovery and detection of a failed node, manually with reference to the predetermined node.
The control node detects a failed node in the method of (1), and the determined node that has been agreed detects a failed node in the method of (2). In addition, in the method of (3), the administrator or the like detects a failed node. Therefore, according to the above-described methods of (1) to (3), since a specific node or the administrator performs determination instead of performing determination by the plurality of nodes, it is possible to prevent falling into the split brain state.
In addition, as a related technique, a technique is known in which a computer divides storage nodes into two or more groups based on the attributes collected from a plurality of storage nodes in order to prevent the loss of data in a distributed storage system (for example, refer to International Publication Pamphlet No. WO 2008/114441). In this technique, a computer assigns distributed data and redundant distributed data to each group so that distributed data obtained by distributing data and redundant distributed data obtained by distributing redundant data of the same content as the data are not present in each of the generated groups.
In addition, as another related technique, a technique is known in which a management server configures the same data pool in all storage devices, which store data, and stores different pieces of data so as to be distributed in a plurality of different storage devices within the pool as much as possible (for example, refer to Japanese National Publication of International Patent Application No. 2011-505617).
Further, as still another related technique, a technique is known in which a network monitoring device divides a plurality of nodes in units of a group and obtains a logical line state from one node of the divided group to monitor the logical line (for example, refer to Japanese Laid-open Patent Publication No. 2010-258614).
In addition, as still another related technique, a technique is known in which a network management system includes a group management apparatus that monitors nodes in a group for each group formed based on the apparatus information of each node and information, such as the number of hops (for example, refer to Japanese Laid-open Patent Publication No. 2011-055231).
In the method of (1), since pieces of information of a plurality of nodes are collected in one point (control node), the control node becomes a single point of failure (SPOF). Accordingly, when the control node fails, there is a problem in that the use of the distributed storage system by the client is limited until the control node is restored.
In the method of (2), since a complicated procedure is performed to make an agreement among a plurality of nodes, unnecessary time may be needed until the agreement is made compared with the method of (1). In addition, in the method of (3), since the determination is artificially made by the administrator or the like, long time may be used until a node failure is detected and recovery processing is performed after the occurrence of the node failure, compared with the methods of (1) and (2) described above. That is, in the methods of (2) and (3) described above, there is a problem in that the start of recovery processing on a failed node is slow and accordingly a period for which the use of the distributed storage system by the client is limited becomes long.
In addition, in all of the related techniques described above, the management apparatus manages a plurality of nodes as in the method of (1), and the above-described problems are not taken into consideration.
Thus, in the above-described techniques of determining each state of a plurality of storage devices in a storage system including a plurality of storage devices, there is a problem in that the availability of the storage system is reduced.
The information processing system as a storage system (distributed storage system) has been described so far, but the present invention is not limited thereto. The above-described problems may also occur similarly when each of a plurality of information processing devices included in an information processing system stores different data from the other information processing devices instead of distributed data.