1. Field of the Invention
Exemplary embodiments of the present invention relate to a method, system and computer readable recording medium for determining a major group in a network split-brain situation that occurs in a network-based distributed computing environment (or a distributed environment). More specifically, exemplary embodiments of the present invention relate to a method, system and computer readable recording medium for determining a major group in a network split-brain situation with reference to history information of nodes.
2. Discussion of the Background
A method of efficiently managing and controlling a variety of network-based distributed systems is needed in the current information technology (IT) environment. In a distributed environment, it is generally found that a plurality of computers are physically or logically connected to one another and a huge calculation problem can be solved using the processing capability of the plurality of computers connected as such. At this point, if there are tens of computers or processors participating in the distributed environment, the distributed environment can be properly maintained by an operator who manages the distributed environment. However, if the scale of the distributed environment includes thousands or tens of thousands of computers or processors, a management system, which is a solution for synchronizing and tuning the computers or processors on the whole, is essentially required.
One factor of such a management system is to guarantee fault tolerance, and particularly, to cope with an error situation where a network is abnormally split due to instability of links between respective nodes configuring a cluster.
FIG. 1 is a view showing an example of a network split-brain situation. In FIG. 1, it is assumed that a distributed environment management system includes a cluster consisting of five servers and each of the servers configuring the cluster maintains the same state information by replicating data in real-time.
If a network split-brain problem occurs in the state of FIG. 1 and a cluster is split into two server groups, servers in group #1 assume that all the servers in group #2 are down (i.e., fall in a state of incapability), and at the same time, the servers in group #2 assume that the servers in group #1 are down. Therefore, the two groups on the network determine themselves as a major group, and clients respectively connect to the two groups, which determine themselves as a major group, and perform a task. For example, client group #1 connected to group #1 performs read and write operations as usual, and client group #2 connected to group #2 also performs read and write operations.
However, in the aforementioned situation, when the network split-brain problem is restored and the two groups should be joined as a single group, it may not be possible to determine data belonging to which group is correct and data belonging to which group should be deleted. That is, there is a problem with service continuity and data integrity in the network split-brain situation.