Some existing systems execute concurrent workflows on distributed nodes with shared operations between the workflows. For example, disaster recovery of virtual machines (VMs) operating in cloud environments requires a high level of coordination between nodes which do not have direct knowledge of the existence or state of the other nodes associated with the workflow. A synchronization mechanism allows for mutual exclusion on any shared operations.
Some existing solutions utilize lock-based synchronization to effectuate the execution of concurrent workflows on distributed nodes with shared operations; however, it is difficult for a lock-based synchronization system to respond to dynamic scaling of concurrency when new nodes are added. For example, some of the existing lock-based synchronization systems are a bottleneck for scalability and are undesirable in a distributed cloud environment. Another existing approach is to replicate the processing functions, and to use protocols for achieving consensus between processors on a network of unreliable communication channels. However, that method sometimes results in failing over the same VM multiple times which is not acceptable for disaster recovery workflows.