In distributed computer systems, it is often the case that several servers and/or networking nodes need to work together. These servers and nodes have to be coordinated, as there is typically networking information that needs to be shared among the machines in order to allow them to function as a single entity. Typical approaches to machine coordination can be very expensive in terms of resources and efficiency.
In general, some synchronization is required for the nodes to agree, as there may be several messages passing between the nodes. This requirement for synchronization may, however, be undesirable in a clustered networking environment. Many clustered environments simply avoid imposing any such synchronization requirement. There are applications, however, where agreement is necessary.
In one case where agreement is needed, a device can exist to which a cluster may want exclusive access. One such device is a transaction log on a file system. Whenever a transaction is in progress, there are certain objects that need to be saved in a persistent way, such that if a failure occurs those persistently-saved objects can be recovered.
For these objects that need to be saved in one place, there is typically a transaction monitor that runs on each server in that cluster or domain, which then uses a local file system to access the object. Each server can have its own transaction manager such that there is little to no problem with persistence. There is then also no need for coordination, as each server has its own transaction manager.
For example, there can be a cluster including three servers, each server having a transaction manager. One of those servers can experience a failure or other problem causing the server to be unavailable to the cluster. Because the failed server is the only server having access to a particular transaction log, none of the transactions in that particular log can be recovered until the server is again available to the cluster.
Recovery of the log can be difficult or at least inefficient, as a problem with the server can take a significant amount of time to fix. Significant server problems can include such occurrences as the shorting out of a motherboard on the server or a power supply being burnt out.