Consolidation of computing resources in data centers is becoming common. Data centers provide a centrally operated, large-scale, network-accessible pool of computing platforms which can be shared by a large number of customers. Online computing services, such as web or cloud services, can likewise be operated on these physical computing assets to provide useful computing services for customer software.
If the physical resources that support the online computing services fail or become inoperative, such as through a natural disaster or other large scale event, the computer resources may fail to automatically restart, reboot, sequentially execute or otherwise return to an operable state once the event is resolved. Failure to properly restart may be the result of a set of unsatisfied boot up requirements for the physical servers or software services operating on the physical servers. These conditions may require human operator or other manual intervention to resolve “deadlock conditions” such as where a first service inadvertently requires a second service to boot first and where the second service likewise requires the first service to boot first.
The computing services can be within the same data center or in separate physical locations, but given the precondition of operation of the computing functionality, inadvertent human intervention or other unforeseen errors in startup, the startup sequence of the physical resources may result in deadlock conditions of the actual programs running or state of operations in the network-based services. The efficient maintenance of workflows within the network-based services can thus be problematic in the distributed network-based services environment.