In some server arrangements such as in a data center, services such as a web page service or a database service can be implemented on multiple server systems configured in a cluster. When one of the servers fails or an entire service cannot be brought up, typically human interaction is required to access diagnostic logs and take corrective action, leading to downtime for the customer.
This downtime results in a web site and/or other services that depend on the downed server to be unavailable for use. In some cases, a person having the necessary proficiency to repair the failed node, or worse, the failed cluster, is not available for several hours, or the repair requires a significant time, resulting in those services being offline or otherwise unavailable for an indeterminate duration.