1. Field of the Invention
The present invention relates to a distributed system in which four or more computers are connected via a network, and a multiplexing control method for the system and, more particularly, to a distributed system which can provide a split-brain solution and realtimeness upon occurrence of a failure at the same time, and a multiplexing control method for the system.
2. Description of the Related Art
In recent years, computer and network technologies have improved remarkably. As a result, there have been attempts to computerize jobs in various kinds of businesses. Many jobs must not be interrupted due to failures depending on their contents, and it has recently become a common practice to build a distributed system by connecting a plurality of computers via a network. As one running method of such a distributed system, multiplexing of execution of deterministic programs using ordered multicast is known.
“Ordered multicast”, “deterministic programs”, and “multiplexing” will be explained below.
Ordered Multicast
In an environment such as a distributed system in which a plurality of computers are connected, the respective computers operate independently. Therefore, in order to make these computers operate synchronously, a special mechanism is required. Ordered multicast is a mechanism for delivering an input to the distributed system to all computers, and guarantees that data arrive at all computers in the same order.
Deterministic Program
Execution of a program amounts to determining the output and the next state in correspondence with the state of a computer when an input is given to the computer. A deterministic program is defined as a program which uniquely determines the output and the next state in accordance with a given input. More specifically, the deterministic program is one that references neither arbitrary values nor random numbers. A feature of the deterministic program lies in unique execution if an initial state and input sequence are determined. In the following description, a “program” in this specification indicates such a deterministic program.
Multiplexing
In a distributed system, respective computers may fail independently. If the entire system does not work due to a failure of only one computer, the availability of the distributed system is lower than that of a single computer. To avoid such a situation, processes associated with the overall system must be multiplexed. By contrast, multiplexing makes the availability of the distributed system higher than that of a single computer. For example, If a distributed system constituted by 10 computers each having an availability of 99% is not multiplexed at all, the availability of that distributed system is as low as about 90%. If this system can withstand failures of up to three computers as a result of multiplexing, the system availability becomes as high as 99.9998%.
Multiplexing of execution of deterministic programs using ordered multicast will be explained below. Assume that a distributed system is constituted by a plurality of computers, and each computer which participates in multiplexing has identical programs.
All computers start from an identical initial state. After that, input data are delivered to all computers in the same order via ordered multicast, thus executing respective programs.
Since input sequences to respective programs have the same order by ordered multicast, the states of all computers are maintained equal to each other due to the feature of the deterministic programs, and all output sequences are equal to each other. That is, execution of programs is multiplexed.
An implementation method of ordered multicast will be briefly explained below.
In order to implement ordered multicast independently of special hardware, exchange of messages according to an appropriate algorithm among computers, i.e., a protocol, is used. Prior to a detailed description of the algorithm, points to be noted will be listed.
As the system is premised on all computers may each fail and may come to a halt at any time, the overall process must not depend on a specific computer to establish multiplexed processes. Therefore, the following points must be noted.
(1) Reception of input to the distributed system is not fixed at a specific computer.
For example, a simple algorithm in which input reception is fixed at a specific computer to determine the order of inputs by temporarily transferring all inputs to that computer, and the inputs are delivered in that order, cannot be used. With this algorithm, if the computer at which input reception is fixed has failed and come to a halt, the order of inputs cannot be determined at that time.
(2) The delivery of input data items to all computers is not fixed at a specific computer.
For example, a simple algorithm in which a specific computer delivers to all computers that are not at halt, cannot be used. With this algorithm, if a delivery computer has failed and halted during delivery, delivery cannot be completed after data are delivered to only some computers.
The aforementioned algorithm will be described in detail below in consideration of the above points.
Conventionally, failure detection plays an important role. Typically, failure detection is done by a heartbeat time-out algorithm. This algorithm determines a failure of a given computer if heartbeats periodically output from each computer cannot be confirmed for a predetermined period of time or more.
Each computer has an input reception queue. As the first step, each computer delivers an input data item located at the head position of the input reception queue to all other computers as a next candidate to be processed by that computer. A computer with an empty input reception queue delivers an input data item obtained first from another computer to all other computers as a next candidate to be processed by that computer.
As the final result of the first step, each computer obtains one or both of “input candidates” and “failure detection” for all computers. A list of “input candidates” and “failure detection” for all computers will be simply referred to as a “list” hereinafter.
As the second step, each computer delivers its own “list” to all other computers. Note that these “lists” may be different in respective computers. This is because if a given computer has failed and halted during the first step, it may deliver its “input candidate” to only some computers. Also, “failure detection” may not be right at the beginning of the second step.
As a result of the second step, if the “lists” obtained from other computers are different from the own “list”, each computer combines them into its own “list”, and repeats the second step. As a final result of the second step, all “lists” of other computers which are free from failures match the own “list”. At that time, the protocol is complete.
Note that each computer can select an input to be delivered by ordered multicast from “input candidates” of its “list” in accordance with a predetermined rule (e.g., the first one). Finally, the selected input is removed from the input reception queue.
With the aforementioned sequence, multiplexing of execution of deterministic programs using ordered multicast in a distributed system in which a plurality of computers are connected via a network is implemented.
The aforementioned sequence suffers the following problems.
(1) Split Brain
A split brain indicates one or more partitions of the context of execution. This split brain occurs when failure detection has been erroneously done. For example, if computers which form a system cannot communicate with each other between two computer groups (network partitioning), these computer groups respectively detect failures and begin to operate independently. Or heartbeat transmission/reception is interrupted due to a temporary high load, and erroneous detection of failures occurs, resulting in a split brain.
Multiplexed processes are most certainly the important ones in the system. If a split brain has occurred, the processes become inconsistent, and may fatally influence the entire system.
In order to make a split brain harder to occur, erroneous detection of failures must be made harder to occur. For this purpose, a sufficiently large time-out value of heartbeats must be assured. In practice, a time-out value of 10 sec to 1 min is normally used.
(2) Realtimeness of Process Upon Occurrence of Failure
If a large time-out value is set, the time from when a failure has occurred until it is detected is prolonged. Then, detection of a failed computer is delayed in the ordered multicast protocol, and execution of ordered multicast temporarily stops during that time. As a result, execution of multiplexing temporarily stops.
Normally, such a situation does not fatally influence the system. However, in a system that attaches an importance to realtimeness, this requirement may not always be met. That is, the upper limit of the heartbeat time-out value is suppressed due to the presence of the realtimeness requirement, and an excessively large value cannot be set.
Consequently, the setup of the heartbeat time-out value may suffer a trade-off relationship between a split brain and realtimeness.