Some network-based computing service providers allow customers to purchase and utilize computing resources, such as virtual machine (“VM”) instances, on a permanent or as-needed basis. In addition to VM instances, such computing service providers typically allow customers to purchase and utilize other types of computing resources. For example, customers might be permitted to purchase access to and use of file and block data storage resources, database resources, networking resources, and other types of computing resources. Utilizing these computing resources as building blocks, customers of such a network-based computing service provider can create distributed applications that provide various types of functionality, such as application hosting, backup and storage, content delivery, World Wide Web (“Web”) hosting, enterprise information technology (“IT”) solutions, database services, and others.
Network-based distributed applications, such as those described above, commonly improve availability by relying upon redundancies for their dependencies. Redundant dependencies allow the application to switch over to substitutes when any dependency fails. For example, if one of the hosts utilized to implement an application fails because of a loss of power, another host in another rack or data center that has not lost power can be utilized. In this way, availability of the application can be maintained in the event of failures.
When deploying a distributed application, system administrators must take care to choose a diverse set of redundant dependencies with a low probability of correlated failure. For example, if a distributed application is executing on two hosts, and the two hosts share the same power supply, then the two hosts are not appropriate substitutes for one another because they will both fail if the power supply fails or there is a power failure. A correlated failure such as this can occur due to the failure of any type of dependency, such as networking components, power or power components, and heating or cooling components. A correlated failure can also occur due to the failure of software components or network services utilized by a distributed application.
After deployment of a distributed application, care must also be taken to maintain the diversity of redundant dependencies. For example, if a system administrator needs to replace one of two hosts, the new replacement host should have the lowest probability of a correlated failure with the existing host. Creating and maintaining redundant dependencies can, however, be difficult for system administrators. As a result, it is not uncommon for distributed applications to fail due to a correlated failure of hosts as a result of the failure of a dependency. This can significantly impact the availability of the application.
The disclosure made herein is presented with respect to these and other considerations.