This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2001-292426, filed Sep. 25, 2001, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a cluster system comprising a plurality of computers connected through a network, and more particularly, to a cluster system having a virtual RAID (Redundant Array of Inexpensive Disks) to permit use of each disk drive connected to each computer as a component of the virtual RAID, a computer for the cluster system, and a parity calculation method in the cluster system.
2. Description of the Related Art
A cluster system comprising a plurality of computers connected through a network has been well known as a system having the function to increase the availability of the whole cluster system by continuing operation with another computer when one computer fails.
A cluster manager operates in such a cluster system. A cluster manager has the following two functions to continue operation using another computer when a failure occurs in one computer.
The first function is to ensure the state consistency among the computers constituting a cluster system. The state consistency means that all computers of a cluster system can refer to and change the state held by a cluster manager, or all computers can always refer to the latest state.
The second function is to detect a failure in any computer of a cluster system. A cluster manager detects a failure and isolates a failed computer from a cluster system.
In this way, operation of a cluster system can be continued under the control of a cluster manager even if one computer should fail. However, in a certain type of cluster system, operation may not be continued from the state at the time of the occurrence of failure unless the data stored in the failed computer is referred to. In such a case, operation is suspended or returned to the state before the occurrence of failure.
It is therefore an object of the present invention to make it possible to use each disk drive connected to each computer of a cluster system as an element of a virtual RAID to continue the system operation when one of the computers constituting a cluster system fails, even if the data stored in that computer is necessary to resume operation from the time of the occurrence of the failure.
According to an embodiment of the present invention, a cluster system comprises a plurality of computers connected through at least one network, and a plurality of disk drives connected to the computers. The cluster system comprises a cluster manager and a control unit.
The cluster manager performs exclusive control of the whole cluster system, and converts a global command which is necessary to handle each disk drive as a component of the virtual RAID, into at least one local command. A global command is equivalent to a read/write command to the RAID. A local command is equivalent to a read/write command to a disk drive, or a parity calculation command.
The control unit comprises a command converter, a disk control means, a parity calculation means and a command transfer means. The control unit operates in each computer, independently of a cluster manager.
The command converter communicates with the cluster manager when the global command is generated in the computer to which it belongs, and makes the cluster manager convert the global command into at least one local command.
The disk control means receives a local command, or a read/write command to the disk drive, from the command transfer means, and reads/writes from/to the disk drive according to this command.
The parity calculation means receives a local command, or a parity calculate command, from the command transfer means, and calculates parity according to this command.
The command transfer means receives a local command from the command converter, and transfers it, based on the command contents, to one of corresponding other computer, the disk control means of the computer to which it belongs and the parity calculation means.
In a cluster system with the above-mentioned structure, each disk drive connected to each computer constituting the cluster system can be used as a component of the virtual RAID. Thus, even if one computer fails, the data written to the disk drive connected to that computer can be restored by the RAID technology from another disk drive connected to another computer. This makes it possible to resume the system operation from the time of the occurrence of failure by using another computer.
The cluster manager may be given a local command generating function to be explained below. Here, the local command is a command to calculate the intermediate or final result of parity calculation using the data necessary to calculate the parity in the computer connected to the disk drive which stores the data. Further, when calculating the intermediate result, the local command transfers the obtained result to the computer connected to the disk drive which stores the data necessary to calculate the next intermediate result or final result of the parity calculation.
Generation of such local command makes it possible to avoid concentrating the data necessary for parity calculation in one computer in the writing operation of the cluster system. That is, the intermediate result of parity calculation is sequentially transferred to the computer which stores the data necessary to calculate the next intermediate result or final result of the parity calculation. Thus, the number of data transfers between the computers necessary for one parity calculation can be decreased, and the writing speed to the virtual RAID can be increased.
The cluster manager may be provided and operated in each computer and operate these cluster managers synchronously with one another through a network. With this structure, the cluster system can be prevented from being stopped as a whole system if a failure occurs in one computer.
A cluster manager may be provided in the computer provided exclusively for the cluster manager, independently of the other computers, and operate the cluster manager only in that exclusive computer. This will decrease the load on the other computers.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.