Field of the Invention
The present invention generally relates to distributed computer systems, and more specifically, to a distributed computer system in which one of the servers of the system is elected to perform a specified service.
Background Art
Distributed Computing Systems are complex aggregations of multiple units that communicate with each other through an interconnect in order to achieve some common goal. To do so, different units in such a system may perform different roles by providing different services to each other. This allows for the distribution of work over many units instead of a single one as in a centralized system, making workloads that a single machine could not possibly sustain feasible. However, given an equal chance of failure of any unit in the distributed system, the chance that any of the units fails, and with it the possibility of accomplishing the goal of the system, is greater in the distributed system than in the centralized system, and increasingly so with the number of units (and services). Different techniques exist to counter this possibility. The two main techniques are: 1) Replicating the same service on a number of units, thus coordinating these units such that the service offered is coherent independently of the unit used to access the service; and 2) Electing a unit to offer a service, monitoring its status, and electing a new one when the first one fails.
One important service in a distributed system is a naming service, which maps human-readable names for objects in the system to more compact, machine-readable identifiers. When two entities in the distributed system are of the same type, they should have the same name, but different from that of other types of entities, such that they can be unequivocally accessed by applications. The simplest way to achieve this is by having one single unit (called the name server) providing the naming service and thus mapping identifiers to names. This excludes any conflict in assigning names to identifiers, provided that all units request the mappings from the same name server. Which unit is actually in charge of providing the service is decided through an election process. When the unit providing the naming service fails, the remaining units go again through the election process to choose a new name server.
A unit offering the naming service stores the mapping of identifiers to names, but this information may be lost if the unit fails to continue operation. A newly elected name server may not know the whole state of the name-to-identifier mapping. In this situation, there is the risk that a request for a mapping is received that is not known to the new name server, and a new mapping is created for that identifier, with the end result of entities of the same type being named differently. A name server may replicate the mapping in all other units, updating them each time a new mapping is done. However, a newly elected server should make sure that it knows all the mappings in the network, having to check with all other units and retrieving any missing mappings. Clearly, the longer a unit has been in the system, the more likely it is that it has collected all updates from the previous name server, and thus the less information it has to retrieve from other units. Therefore the election process should ideally select the oldest unit.