The present invention relates to a cluster system in which a plurality of computers connected to a network operate correlatively, a computer for use in a cluster system, and a program for causing a computer to execute.
Conventionally, a cluster system has been developed aimed at improving a processing capacity and so forth by connecting a plurality of computers (in general, a multi-process computer) via a network to cause these computers to operate correlatively (for example, as to the cluster system, see Japanese Patent Application No. 160657/1995, Japanese Patent Application No. 66022/1999, and Japanese Patent Application No. 2151821200.0. Also, as to a data transfer in a multi-process computer, see Japanese Patent Application No. 154272/1989 and Japanese Patent Application No. 73518/1993).
FIG. 14 is a block diagram illustrating a conventional, and general cluster system. In FIG. 14, a plurality of computers (in the cluster system, each computer is referred to as a node) 1 and 2 are connected via a network 3.
Nodes 1 and 2 comprise a plurality of central processing units (CPUs) 4 to 6 and 13 to 15, node controllers 8 and 17, main memories 9 and 18, input/output (I/O) controllers 11 and 20, which are input/output devices, and network adapters 12 and 21 respectively.
The CPUs 4 to 6 and the node controller 8 are connected to a CPU bus 7, and the CPUs 13 to 15 and the node controller 17 are connected to a CPU bus 16. Also, the node controller 8, the I/O controller 11 and the network adapter 12 are connected to an I/O adapter bus 10 and the node controller 17, the I/O controller 20 and the network adapter 21 are connected to an I/O adapter bus 19.
The network adapters 12 and 21 are connected to the network 3 via inter-node connection buses 22 and 23 respectively.
The node controllers 8 and 17 control the CPU buses 7 and 16 and the I/O adapter buses 10 and 19 respectively, and are interconnections of main memories 9 and 18. To the I/O adapter buses 10 and 19 are connected a secondary memory device such as a magnetic disk (not shown) in addition to the I/O controllers 11 and 20 and the network adapters 12 and 21 respectively. Said secondary memory device is connected to the I/O adapter bus 10 via a PCI (Peripheral Component Interconnect) board and a SCSI (Small Computer System Interface).
A couple of the computers 1 and 2 are connected to the network adapters 12 and 21.
In the cluster system configured above, between the network adapters 12 and 21 that are network devices, transmission/reception of data is performed with a unit of a packet. As to the packet, depending upon a mounting method of the network connection device and the type of the network itself that configures the system, the longest packet length thereof is decided. In the event that the network connection device is an ethernet control device that is generally used, the longest packet length is said to be 1500 bytes. It is called a fragment to transmit data by splitting transfer data into the maximum transfer packet.
For example, in the event that data transmission is performed between nodes with a TCP (Transmission Control Protocol), when its data length is larger than the size of data that the network connection device can transmit at a time, the data is partitioned into the longest packet length at which the network connection device can transmit, is split into the packet, and transmitted. No means exists in the TCP for executing an interrupt by means of software, and the interrupt is adapted to be generated by means of hardware every time each packet is finished, whereby the network connection device of a receiving side node generates the interrupt for the CPU within its own node every time each packet is received, and performs an interrupt process of determining whether or not its packet is a final packet of the transfer data. At this moment, in the event that data transfer amount between the nodes is large, the packets to be split are numerous, and the problem existed that many of the interrupts to be generated for each reception of the packets became useless interrupt processes.
Also, it is the data transmission/reception between the network connection devices, which passed through the secondary memory device and the I/O adapter buses 10 and 19 to which the I/O controllers 11 and 20 were connected, whereby the problem existed that overhead of data transmission itself also was large. Hence, as shown in a conventional example of FIG. 15, the cluster system is being developed in which the connection is made between a plurality of nodes 31 and 32 not via the I/O adapter buses 10 and 19 but via node controllers 33 and 34. Additionally, in FIG. 15, identical codes are appended to identical parts to that of FIG. 14.
In the cluster system shown in FIG. 15, so as to do away with the above-mentioned interrupt overhead, in many cases a method (flag method) is employed that a transmitting side node also transmits special data referred to as a flag, together with data to be transmitted, and informs completion of the data transmission. In this flag method, a receiving side node reads (polls) a flag repeatedly while receiving the data, upon receiving a flag that indicates TRUE (=completion of data transmission), concludes that the data reception was completed, and starts a process employing the received data. This method enables the problem of the above-mentioned interrupt overhead to be settled, and enables the completion of the data reception to be known instantly, whereby a high-speed response can be realized to an application that is in the condition of waiting for the data on the receiving side.
In the event of employing the above-mentioned flag method, however, in the receiving side node, the application that is in the condition of constantly waiting for the data is required to continue the polling of the flag, whereby a certain specific application uses the CPU of the receiving side node for polling, and the problem exists that a time for using the CPU is wasted.
Also, in addition to the above-mentioned two problems, in the cluster system, many nodes communicate data respectively, whereby a risk exists of having a harmful influence upon the other nodes caused by a certain node's malfunction. In order to construct a cluster system with high availability, as in the case of the interrupt method previously mentioned, a mechanism (security mechanism) is needed for guarding against interrupt by the node to which the malfunction occurred.
Accordingly, a scheme has been required for settling the problem of the interrupt overhead by the fragment mentioned, and the problem of the polling in the flag form, and for enabling the security mechanism to be maintained.