1. Field
The embodiments discussed herein are directed to a computer-readable recording medium storing a software update command program, a software update command method, an information processing device, and to a computer-readable recording medium storing a software update command program with which software update can be performed during operations, a software update command method, and an information processing device.
2. Description of Related Art
With a cluster system, a plurality of computers are used for decentralized processing. Such computers configuring the cluster system are referred to as nodes. In the cluster system, there needs to always recognize whether such nodes are operating normally or not to determine which of the nodes is to be assigned processing tasks.
For recognizing the state of service operations of the nodes in the cluster system, there is a method for providing heartbeats at regular intervals from the nodes to a monitoring node. The heartbeats are information indicating that the nodes providing the heartbeats are those operating normally. The monitoring node determines only any nodes providing the heartbeats at regular intervals as nodes operating normally.
When detecting any node not providing the heartbeats at regular intervals, the monitoring node determines that the node is suffering from a failure, and eliminates the node from a list of nodes to perform processing tasks. The node eliminated as such from the list of nodes to perform the processing tasks is freed from the cause of failure, and then is made available again for the service operations. When the service operations are resumed as such, the monitoring node adds the node back to the list of nodes to perform the processing tasks.
Note here that the information such as heartbeats to be provided at regular intervals can be more “real-time” with a shorter interval, but such a shorter interval causes the increase of communications load. In consideration thereof, there is a technology having been under study for changing the interval of periodic transmission based on the state of communications (Japanese Laid-open Patent Publication No. 2004-364168).
During the service operations in the cluster system, some need may arise for software update for any of the nodes. When such a need arises, generally, the node requiring software update is stopped in operation, and then the software update is accordingly started. As such, during the software update, the provision of service is temporarily stopped by the node. The issue here is that, when the provision of service is stopped as such, the monitoring node responsively detects it as a failure, and an error task is accordingly executed. However, the software update is not an error, and once the software update is completed, it is evident that the service operations are normally started with no issue. Therefore, if the error task is executed as such, it can mean that the task execution is needless, thereby causing the reduction of operation efficiency.
To overcome such a issue, systems have been proposed for a software update without causing the provision of service to stop. For example, a cluster system can use an agent for provision of service, and a cluster control section for controlling the agent while communicating with other computers in the cluster system. The cluster system of such a configuration performs software upgrade in the cluster control section while the agent is continuously providing the service (Japanese Laid-open Patent Publication No. 2005-085114).
Also proposed is a method for collectively installing a cluster system configured by a plurality of nodes with a master node keeping information about system installment, and issues an activation command to slave nodes while performing the system installment to itself. After the slave nodes are activated, such system installment is repeatedly performed to all of the slave nodes of high- to low-order, thereby installing the cluster system in its entirety (Japanese Laid-open Patent Publication No. 2002-304299).