1. Field of the Invention
The present invention relates to a distributed system and a redundancy control method in which the process is redundant by N (where N is an integer 4 or more) processing elements which are typically computers connected by a network and able to communicate with each other.
2. Description of the Related Art
In recent years, computer technologies and network technologies have been improved so remarkably that business computerization has spread widely. For some businesses, however, long suspension due to a computer fault is not allowed. Generally, therefore, a distributed system has recently been constructed to connect a plurality of computers by a network. A distributed system configured of a plurality of mutually communicable processing elements (process execution units) operated concurrently on a single computer is also known. These processing elements are included in a known process of mutual communication carried out by an operating system (OS) to transmit and receive messages or packets through a common interface. Now, an explanation is made about a distributed system with a plurality of computers connected by a network.
The redundant computer processing is known as one of the methods of operating a distributed system. In the distributed system, each computer may make a fault independently. In the case where the whole system fails due to a fault of a single computer, the availability of the system is reduced below that of a single computer. To prevent such an inconvenience, the process of the whole system is required to be redundant. By redundancy process of the whole system, the availability of the distributed system can be increased beyond that of a single computer. For example, consider a distributed system configured of ten computers, each of which is operating at the availability of 99%. Assuming there is no redundancy at all, then the availability of the whole system is about 90%. On the other hand, provided that multi-point-failure of up to three computers can be tolerated by redundancy, the availability of the whole system become about 99.9998%.
[Fail-Over Method]
In the distributed system, a method of the redundancy process of computer is conventionally known in which the particular process is transferred to another computer upon computer fault detection. This method is known as a fail-over method.
In the fail-over method, a computer fault is generally detected by periodical communication between the computers to check the mutual operating conditions. This communication is called the “heart beat”. The stop failure of the computer is detected by the time-out of the heart beat. In other words, a computer failing to send out the heart beat for a preconfigured time interval is considered to be stopped.
In a distributed system using the fail-over method, a split brain poses a problem. The split brain is defined as the fact that an execution context (state) is partitioned into two or more. The split brain occurs when a fault is detected erroneously. In the case where two groups of computers making up a distributed system come to fail to communicate with each other (networking partitioning), for example, the two computer groups detect the fault of each other. In this case, each of the two computer groups starts the operation independently, and therefore a split brain occurs. In another case, the transmission of the heart beat of a given computer is temporarily suspended due to an abnormally heavy load and a fault is detected. Even in the case where the computer resumes the operation subsequently, a split brain may continue.
The redundant process is generally an important one in a distributed system. Once a split brain occurs, therefore, the process comes to lack consistency thereby giving a fatal effect on the system as a whole.
[Majority Voting Method]
A method using the Majority voting (Majority voting method) is known to basically solve the problem of the split brain in the fail-over method. In this method, the same process is executed by all redundant computers, and in the case where the operation of the computers representing a majority of the whole can be shared (synchronized) with each other, the process is continued regardless of the operation of the remaining computers. This method can avoid the split brain basically.
Assume, for example, that the process is redundant (tripled) by three computers X, Y, Z, the network of which is partitioned into a group A of two computers X, Y and a group B of one computer Z. Even in this case, group A continues the process. The process of group B, on the other hand, is suspended. The suspension is defined as the state in which the process cannot be continued until the number of computers of which the operation can be shared with each other reaches a majority.
Assuming that computer Z has hung under an abnormally heavy load. The process of the computer group (group A) representing the remaining majority of computers is continued regardless of computer Z. After restoration of computer Z, the process of computer Z which fails to constitute a majority is not executed arbitrarily. In other words, computer Z resumes the operation after being resynchronized with the process of the computer group representing a majority.
[Quorum Algorithm]
The Majority voting method described above constitutes one of the Quorum algorithms. In the Quorum algorithm, all the redundant computers execute the same process, and once the operation of the computers representing the quorum come to be shared with each other, the process is continued regardless of the operation of the remaining computers. Jpn. Pat. Appln. KOKAI Publication Nos. 2001-117895 (paragraphs 0007, 0018 to 0022, FIGS. 1 to 5) and 2003-67215 (paragraphs 0056 to 0064, FIG. 4) disclose a distributed system using the Quorum algorithm. An example of the Majority voting method having a quorum representing one half of the whole (i.e. the number is more than one half) is disclosed in Jpn. Pat. Appln. KOKAI Publication No. 2001-117895. In Jpn. Pat. Appln. KOKAI Publication No. 2001-117895, a distributed system is disclosed in which in the case where the number of servers (computers) representing a majority (i.e. the number constituting a quorum group) is N, the number N can be dynamically changed as far as the servers representing the quorum (majority) of N are in operation. Jpn. Pat. Appln. KOKAI Publication No. 2003-67215, on the other hand, discloses a ⅔ quorum algorithm in which the quorum is a minimum integer representing more than ⅔ of the whole. The ⅔ quorum algorithm is considered a redundancy method having a higher redundancy.
As described above, the Quorum algorithm, unlike the fail-over method, can basically avoid the split brain. The Quorum algorithm, however, poses another problem not encountered by the fail-over method. This problem is explained below.
Consider the case that a plurality of computers make faults, i.e. simultaneous multi-point-failure occur in a distributed system. In the fail-over method, the process can be continued as long as one computer is in operation. In the Quorum algorithm, on the other hand, the process can be continued if and only if quorum of computers are in operation. Once the number of computers in operation is less than the quorum, the processes on the operating computers cannot proceed. In such a case, the system is generally stopped.
But, there is another approach. Instead of the system being stopped in the case where the number of computers in operation is reduced to less than the quorum, the process may be suspended temporarily at the particular time point, and when some of faulty computers are recovered and rebooted, they may be resynchronized with the process of the remaining computers thereby to resume the process automatically. However, there was a technical problem of preventing the generation of a split brain on time axis when employing this method. The split brain on time axis is defined as the type of split brain which is developed as a mismatch between the external input/output process in the case where a redundancy process is repeated from a given time point and executed again.
The split brain on time axis is explained with reference to a distributed system using the ⅔ quorum decision algorithm of the redundancy process with seven computers #1 to #7 shown in FIG. 1. In this case, the quorum is 5. First, at time point T1 when computers #1 to #7 complete processes P1, P2, P3, assume that communication fails between including the two computers #1 and #2 and the five computers #3 to #7. In other words, the network is partitioned into computer group A including the two computers #1 and #2 and computer group B including the five computers #3 to #7.
In this case, computers #3 to #7 of the group B satisfying the quorum continue the process. Computers #1 and #2 of group A not satisfying the quorum, on the other hand, suspend the process. Assume that a multi-point-failure involving all the computers #3 to #7 of group B occurs at a time point T2 when the five computers #3 to #7 complete processes P4, P5, P6 following process P3. Also assume that computers #3 to #7 of group B are rebooted and the network partitioning is eliminated at time point T3.
Rebooted computers #3 to #7 of group B are resynchronized with computers #1 and #2 of group A. The process of computers #1 and #2 of group A is in suspension at time point T1, i.e. at the end of process P3. The seven computers #1 to #7, therefore, resume process P3 at time point T3 from the end of execution of the process. As a result, computers #3 to #7 execute the process once more from time point T1 to T2. In view of the fact that the process resumed from time point T3 involves the input/output of signals from/to an external source, however, a mismatch, i.e. a split brain on time axis may occur between processes P4′, P5′, P6′ following process P3 resumed from time point T3 and processes P4, P5, P6 executed from time points T1 to T2.