1. Field of the Invention
The present invention relates to a cluster system including an information processing apparatus with a redundancy configuration of operating and stand-by nodes, a control method, and a computer-readable medium and, more particularly, to a control method of the redundancy configuration in the cluster system which performs distributed management of data using a plurality of nodes.
2. Description of the Related Art
As a cluster system, which distributes loads to a plurality of nodes, a system which includes a stand-by node as an auxiliary unit in addition to an operating node, and implements fail-over processing for the stand-by node when the operating node goes down, so as to continue system operation is known. For example, in Japanese Patent Laid-Open No. 2006-235837, a load distribution apparatus management server manages connection information required to access an operating node, and each client specifies and accesses the operating node based on the connection information. Then, the load distribution apparatus management server monitors the status of the operating node. When resources of the operating node become short, the server switches the operating node to a standby-node to update the connection information, thereby implementing the fail-over function.
On the other hand, a method of implementing the fail-over function in a node that holds static data (to be referred to as a database hereinafter) is known. For example, in Japanese Patent Laid-Open No. 05-191409, a backup of a database of an operating node is held by a stand-by node, and a change of the database of the operating node is transmitted to the stand-by node as needed. Then, when a failure such as power-OFF has occurred in the operating node, the operating node is recovered using the backup of the stand-by node, thus implementing the fail-over function.
In the configurations required to implement the fail-over function, a phenomenon that both the operating and stand-by nodes operate as operating nodes depending on the system circumstances is known as a problem. In order to solve this problem, for example, in Japanese Patent Laid-Open No. 2004-171370, a given node confirms whether or not the self-node is isolated by attempting a communication with another node in addition to mutual communications between the operating and stand-by nodes, and only when it is judged that the self-node is not isolated, a connection is enabled.
However, the related arts cannot easily implement the fail-over configuration in the following two situations. As the first situation, in an environment configured by a plurality of operating nodes, a wide variety of mutual communications with a stand-by node have to be covered, resulting in difficult management. Since a network becomes more complicated with increasing number of nodes, it is difficult to judge if the self-node is only isolated by a communication with another node when, for example, the other node itself is isolated. For this reason, both the operating and stand-by nodes are likely to operate. Especially, when each node manages data, identical data is updated from both the operating and stand-by nodes, and may become inconsistent. As the second situation, when a communication link between the stand-by node that holds the backup and the operating node is disconnected, backup data can no longer be updated. If fail-over processing is executed, data is lost, and the data consistency cannot be ensured.