1. Background and Relevant Art
Computer systems and related technology affect many aspects of society. Indeed, the computer system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of tasks (e.g., word processing, scheduling, and database management) that prior to the advent of the computer system were performed manually. More recently, computer systems have been coupled to one another and to other electronic devices to form both wired and wireless computer networks over which the computer systems and other electronic devices can transfer electronic data. As a result, many tasks performed at a computer system (e.g., voice communication, accessing electronic mail, controlling home electronics, Web browsing, and printing documents) include the exchange of electronic messages between a number of computer systems and/or other electronic devices via wired and/or wireless computer networks.
In some computing environments, it is desirable for multiple computer systems to be able to agree on a value for a given variable without holding a central lock and being resilient to failure in any one of the computer systems while providing a scalable implementation. For example, in a distributed computing environment, it may be desirable for a number of front-end servers to determine an appropriate back-end server for receiving different channels (e.g., an IN channel and an OUT channel) of a client session. Thus, in an environment with a Web farm of multiple RPCProxy front-end servers, the RPCProxy front-end servers may need to do load balancing to a farm of back-end servers. That is, RPCProxy front-end servers need to be able to agree (or arbitrate) on which back-end server the RPC/HTTP IN and OUT channels will be sent to.
More generally, N threads/agents (e.g., front-end servers) want to set a value in M objects (e.g., back-end servers). One conventional solution to the problem is for each thread/agent to try to obtain a lock on each object in turn. After lock is obtained on all objects, the value is changed by thread and all objects are unlocked. In case of conflict, threads back off and try again.
However, there are at least several problems with this conventional solution. First, this solution has an unbound worst case scenario—conceivably, the threads can be trying to obtain a lock forever. Second, the efficiency of this solution drops as it deals with more objects. Third, retries can be resource intensive between processes/machines and thus we want to limit those to a minimum.