Modern computer systems are capable of executing multiple processes simultaneously. In a given computer system, there may exist multiple operating systems running simultaneously on different processors. These processors may be provisioned on a single node, on different nodes in a computer network or a cluster, or on different partitions of the computer system. These various processes may execute autonomously. In other words, a process may not know at any given time what actions the other processes have taken.
Some situations may require, on a periodic schedule, that a task be performed only once by one of the processes. These tasks are referred to herein as periodic, single-execution (PSE) tasks. Suppose any one of the processes is capable of the performing the PSE task, and each process competes to perform the PSE task based on the aforementioned periodic schedule. Since each autonomous process is not aware of the actions taken by other processes, there is a need in these situations to efficiently manage the execution of the PSE task so that the PSE task is not needlessly and/or erroneously performed more than once during each rotation of the periodic schedule.
One way to coordinate the various processes is to designate one of the processes as the designated master process (“DMP”). During each turn of the periodic schedule, the DMP would perform the required PSE task(s). If the DMP terminates or crashes, another process would be designated the new DMP to handle the PSE task(s) going forward.
There are however drawbacks with this approach. As an example, when the DMP crashes, there is no way for another process to step in and perform the required PSE task(s) unless one of the remaining processes is first designated the new DMP. In some cases, the delay involved in detecting the DMP failure and in designating a new DMP may cause the PSE task(s) to be skipped in one or more rotations of the periodic schedule. Also, the process of designating a process a DMP and re-designating a new DMP when the previous DMP crashes involves a non-trivial amount of overhead.
In view of the foregoing, there are desired improved methods and apparatus for managing the execution of a PSE task among multiple autonomously executing processes, each of which is scheduled to attempt to perform the PSE task based on a periodic schedule.