Distributed processing systems are computer data processing systems which use a plurality of digital processing units, whether geographically co-located or not, to perform complex data processing functions. When a single, or a plurality of related, data processing tasks are distributed among a plurality of hardware processing units, it is necessary to coordinate the data processing tasks to insure that intermediate results are available before initiating subsequent tasks using those results. It is also necessary to redo tasks the results of which are lost due to processor failure, to isolate failed hardware units and to recover from software errors and hardware faults.
In some distributed processing systems, the job of coordinating the plurality of processors is distributed or decentralized. In these decentralized systems, all processors perform identical functions and coordinate with the other processors by means of messages exchanged with these other processors. Such decentralized control systems are difficult to design and susceptible to failures in any one of the processors. This problem is overcome in centrally-controlled distributed processing systems, where one of the processors, under program control, is used to coordinate the activities of all of the other processors. However, since a failure of the coordinating processor renders the entire multiprocessing system unavailable, it is desirable to duplicate the coordinating capability in all of the processors. It is then necessary to assign the coordinating responsibility to an operative one of the processors at start-up or upon failure of the current coordinating processor.
It is therefore necessary in distributed processing systems with centralized control to provide a dynamic protocol or strategy for designating one, and only one, of the processors as the coordinating processor. Moreover, this strategy, and the mechanisms embodying the strategy, must be capable of assigning, at any time, one and only one processor as the coordinating processor, even in the presence of multiple, concurrent errors or failures.
Unfortunately, presently available systems for assigning the coordinating processor are complex, expensive and subject to multiple error failures. Such algorithms are described in "Auditor: A Framework for High Availability of DB/DC Systems" by W. Kim, IEEE 1982, and "Elections in a Distributed Computing System" by H. Garcia-Monina, IEEE Trans. on Computers, Vol. C-31, No. 1, January, 1982.