1. Field of the Invention
The present invention relates to computer systems. More particularly, the present invention relates to computer cluster systems that can improve the availability with use of a plurality of computers respectively.
2. Description of Related Art
(Patent Document 1)
JP-A No. 24069/2002
In recent years, computer systems are becoming indispensable social service infrastructures like power, gas, and water supplies. Such the computer systems, if they stop, will come to damage the society significantly. To avoid such the service stop, therefore, there have been proposed various methods. One of those methods is a cluster technique. The technique operates a plurality of computers as a group (referred to as a cluster). As a result, when a failure occurs in one of the computers, a standby computer takes over the task of the failed computer. And, no user knows the stop of the computer during the take-over operation. While the standby computer executes the task instead of the failed computer, the failed computer is replaced with a normal one to restart the task. Each computer of the cluster is referred to as a node and the process for taking over a task of a failed computer is referred to as a fail-over process.
To execute such a fail-over process, however, it is premised that the information in the failed computer (host processor) can be referred to from other host processors. The information mentioned here means the system configuration information (IP address, target disk information, and the like) and the log information of the failed host processor. The log information includes process records. The system configuration information that is indispensable for a standby host processor that takes over the task of a failed host processor as described above is static information whose updating frequency is very low. This is why each of the host processors in a cluster system will be able to retain the configuration information of other host processors without arising any problem. And, because the updating frequency is very low as described above, there is almost no need for a host processor to report the modification of its system configuration to other host processors, thereby the load of the communication processes among the host processors is kept small. The log information mentioned here refers to records of processes in each host processor. Usually, a computer process causes each related file to be modified. And, if a host processor fails in an operation, it becomes difficult to decide correctly how far the file modification is done. To avoid such a trouble, the process is recorded so that the standby host processor, when taking over a process through a fail-over process, restarts the process correctly according to the log information and assures that the file modification is done correctly. This technique is disclosed in JP-A No. 24069/2002 (hereinafter, to be described as the prior art 1). Generally speaking, the host processor stores the log information in magnetic disks. By the way, the inventor of prior art 1 does not mention the log storing method.
It is an indispensable process for cluster systems to store the log. However, the more the host processor stores the log in magnetic disks, the more its performance drops. Because latency of a magnetic disk is much longer than computation time of the host processor. In general, the latency of a magnetic disk equals to 10 milliseconds. On the other hand, the host processor calculates in time of the order of nanosecond or picosecond. The prior art 1 also discloses a method to avoid the problem by storing logs in a semiconductor memory referred to as a “log memory”. A semiconductor memory can store each log at a lower overhead than magnetic disks.
According to the prior art 1, each host processor has its own log information in the “log memory”. They do not share the “log memory”. That is why a host processor sends a copy of its log information in its “log memory” to that of another host processor when the first one modifies its log information. According to the prior art 1, “mirror mechanism” takes charge of said replication of the log information. In the case of prior art 1, the number of host processors is limited only to two. So, the copy overhead is not so large. If the number of host processors increases, however, the copy overhead also increases. More specifically, when the number of host computers is n, the copy overhead is proportional to the square of n. And, if the performance of the host processors is improved, the log updating frequency (i.e. log copy frequency) also increases. Distribution of a log to other processors thus inhibits the performance improvement of the cluster system. In other words, the distribution of a log is a performance bottleneck of the cluster system.
Furthermore, in the prior art 1, the inventor does not mention that the “log memory” may be a non-volatile memory. Log information that is not stored in a non-volatile memory might be lost at a power failure. If the log information is lost, the system cannot complete a completed operation by means of the log information.
In order to solve the problem of the conventional technique as described above, storage for log information must satisfy the following three conditions:    (1) All host processors in the cluster system can share it.    (2) It must be non-volatile storage.    (3) Host processors can access it at low overhead.The magnetic disk is one of such the non-volatile media to be shared by a plurality of host processors. However, its access overhead is large as described above.
Recently, some magnetic disk systems come to have a semiconductor memory referred to a disk cache. A disk cache can store data of the magnetic disk system temporarily and function as a non-volatile memory through a battery back-up process. In addition, in order to improve their reliability, some magnetic disk systems have a dual disk cache which stores the same data between those disk caches. The disk cache thus fulfills the above three necessary conditions (1) to (3). Thereby it is suited for storing logs. Concretely, a disk cache is low in overhead because it consists of semiconductor memory. It can be shared by a plurality of host processors because the disk cache is part of a magnetic disk. Furthermore, it comes to function as a non-volatile memory through a battery back-up process.
However, the disk cache is an area invisible from any software running in each host processor. This is because the software functions just as an interface that specifies the identifier of each magnetic disk, the addresses in the magnetic disk, and the data transfer length for the magnetic disk; it cannot specify any memory address in the disk cache. For example, in the case of the SCSI (Small Computer System Interface) standard (hereinafter, to be described as the prior art 2), which is a generic interface standard for magnetic disk systems, the host processors cannot access the disk cache freely while there are commands used by host processors to control the disk cache.